## BigDL-Nano Keras Inference Example
---
This example shows the usage of BigDL-Nano Tensorflow Keras inference pipeline.

In [1]:
from time import time

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from tensorflow.keras import layers
from bigdl.nano.tf.keras import Model

In [2]:
IMG_SIZE=224
BATCH_SIZE=64
DATASET_NAME="stanford_dogs"

### Loading data
---
Here we load data from tensorflow_datasets (hereafter TFDS). Stanford Dogs dataset is provided in TFDS as stanford_dogs. It features 20,580 images that belong to 120 classes of dog breeds (12,000 for training and 8,580 for testing).

In [3]:
(ds_train, ds_test), ds_info=tfds.load(
    DATASET_NAME, data_dir="tensorflow_datasets/" ,split=["train", "train"], with_info=True, as_supervised=True
)
NUM_CLASSES = ds_info.features["label"].num_classes
STEPS = len(ds_train)/BATCH_SIZE

size = (IMG_SIZE, IMG_SIZE)
ds_train = ds_train.map(lambda image, label: (tf.image.resize(image, size), label))
ds_test = ds_test.map(lambda image, label: (tf.image.resize(image, size), label))

2022-05-24 17:34:59.182070: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [4]:
def input_preprocess(image, label):
    label = tf.one_hot(label, NUM_CLASSES)
    return image, label


ds_train = ds_train.map(
    input_preprocess, num_parallel_calls=tf.data.AUTOTUNE
)
ds_test = ds_test.map(input_preprocess)

### Loading Model
---
Load the EfficientNetB0 Model using the H5 file saving using bigdl.nano.tf.keras.Model after single process training in the nano-keras-fit-example.

In [5]:
model = keras.models.load_model("EfficientNetB0.h5")

In [6]:
start = time()
model.predict(ds_test.batch(64))
inference_time_model_basic = time() - start

tcmalloc: large alloc 1073741824 bytes == 0x556536302000 @  0x7f239fc97d3f 0x7f239fcce0c0 0x7f239fcd1082 0x7f239fcd1243 0x7f2382062402 0x7f23763fceb0 0x7f237641d0b5 0x7f23764209ea 0x7f2376420f69 0x7f23764212d1 0x7f2376415ce3 0x7f2371adb051 0x7f237193638d 0x7f23716c9087 0x7f23716c991e 0x7f23716c9b1d 0x7f237d0babf5 0x7f2371adcd7c 0x7f2371a66cec 0x7f2376e2976e 0x7f2376e261f3 0x7f23721b8313 0x7f239fc2e609 0x7f239fb53163


### Quantize Model
---
Use Model.quantize from bigdl.nano.tf.keras to calibrate a keras model for post-training quantization.<br>
Here are the parameters used in the notebook:
```
    :param calib_dataset:  A tf.data.Dataset object for calibration. Required for
                            static quantization.
    :param val_dataset:    A tf.data.Dataset object for evaluation.
    :param batch:          Batch size of dataloader for both calib_dataset and val_dataset.
    :param metric:         A Metric object for evaluation.
    
    :param tuning_strategy:    'bayesian', 'basic', 'mse', 'sigopt'. Default: 'bayesian'.
    
```
Access more details from [Source](https://github.com/intel-analytics/BigDL/blob/main/python/nano/src/bigdl/nano/tf/quantization.py#L22)

In [7]:
model_quantized = model.quantize(calib_dataset=ds_test,
                                 val_dataset=ds_test,
                                 batch=BATCH_SIZE,
                                 metric=tf.keras.metrics.CategoricalAccuracy(),
                                 tuning_strategy='bayesian',
                                 )

2022-05-24 17:35:45.590811: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-05-24 17:35:45.590985: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-05-24 17:35:45.614230: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.007ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.

2022-05-24 17:35:46.773445: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-05-24 17:35:46.773615: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-05-24 17:35:46.931195: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:114

Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-05-24 17:35:49 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 444.82 ms
2022-05-24 17:35:49 [INFO] Pass GraphCseOptimizer elapsed time: 122.43 ms
2022-05-24 17:35:50 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 794.4 ms
2022-05-24 17:35:50 [INFO] Pass UpdateEnterOptimizer elapsed time: 23.9 ms
2022-05-24 17:35:50 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 27.12 ms
2022-05-24 17:35:50 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 27.05 ms
2022-05-24 17:35:50 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 27.66 ms
2022-05-24 17:35:50 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 27.57 ms
2022-05-24 17:35:50 [INFO] Pass ExpandDimsOptimizer elapsed time: 27.12 ms
2022-05-24 17:35:50 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 27.55 ms
2022-05-24 17:35:50 [INFO] Pass MoveSqueezeAfterReluOptimizer elapsed time: 26.76 ms
2022-05-24 17:35:50 [INFO] Pass Pre O

Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


INFO:tensorflow:No assets to save.


2022-05-24 17:36:42 [INFO] No assets to save.


INFO:tensorflow:No assets to write.


2022-05-24 17:36:42 [INFO] No assets to write.


INFO:tensorflow:SavedModel written to: /home/projects/notebooks/nc_workspace/2022-05-24_17-35-44/fp32_logged_graph/saved_model.pb


2022-05-24 17:36:42 [INFO] SavedModel written to: /home/projects/notebooks/nc_workspace/2022-05-24_17-35-44/fp32_logged_graph/saved_model.pb
2022-05-24 17:36:42 [INFO] Save quantized model to /home/projects/notebooks/nc_workspace/2022-05-24_17-35-44/fp32_logged_graph.










Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2022-05-24 17:36:42 [INFO] Saver not created because there are no variables in the graph to restore


INFO:tensorflow:The specified SavedModel has no variables; no checkpoints were restored.


2022-05-24 17:36:42 [INFO] The specified SavedModel has no variables; no checkpoints were restored.
2022-05-24 17:36:42 [INFO] Start sampling on calibration dataset.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

 1.22203565 -0.143224552 0.101410322 -2.86156893 0.450643241 0.184077606 -0.223688245 -1.55819523 -0.6828475 -0.401322514 -1.64748859 -1.90660763 0.944980502 0.0297476128 0.44439739 -0.814853609 -1.66064394 0.213344336 -1.98429096 0.627927125 -2.73357439 0.550166368 0.479059368 0.204551339 0.577873826 -0.384993464 0.53020376 -0.8798244 1.19219804 -0.153664067 -0.105075821 -1.14992368 -1.35700953 -0.834021211 -1.0260272 0.0320775136 -0.20244357 -0.393300503 0.13644655 -0.743040681 -0.294280618 -0.00773

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

.598287225 -0.788725615 -0.395620406 -0.722800732 -1.86103213 0.497758031 -2.1983 -0.829850078 1.04649723 0.847048283 -1.67969656 0.859124303 -1.15170908 2.3358705 -1.00412226 -2.22572 -0.0851377845 -0.480529547 -1.51529205 -1.17972755 -2.43111229 0.297320187 -1.81795251 -3.49350286 -1.53465128 1.28207719 0.302969694 -0.175653815 -2.09365988 2.25784135 -0.786594868 -3.48505664 -0.437731504 1.3940798 0.0817767382 -1.31018531 1.74685657 0.0486419611 1.6104691 -1.14674664 1.47597671 1.10568392 0.804662049 0.0734808818 -0.982267737 -1.9345566 -2.31465364 -2.01766062 -2.7932713 0.436977148 -2.74525642 -1.03861153 -3.28470063 -1.39321649 -1.45777929 -0.738891602 -1.1981

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

;EfficientNet/block1a_se_expand/BiasAdd__print__;__KL:[[[[-0.27468726 -0.682468891 0.0327165574 -0.466147065 -0.543614805 -0.0414966941 -0.596318543 0.0972531363 -0.598511457 0.382771969 -0.729704201 0.516333461 0.595757306 -0.917793155 -0.0764465481 0.132728621 0.146548301 -0.876699328 -0.548973322 0.160899431 0.0800001919 -0.319182336 -0.260833383 -0.20458515 0.141628176 0.711230218 -0.190306142 0.468148798 2.81949201e-06 1.00185919 -0.244955227 0.0825103074]]][[[-0.275894582 -0.675205767 0.0301050879 -0.454115927 -0.578837574 -0.0343699828 -0.602912 0.0972592831 -0.519262552 0.400638819 -0.806531549 0.518760622 0.604164124 -0.872684 -0.0510768145 0.142140642 0.

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

;EfficientNet/block5a_se_expand/BiasAdd__print__;__KL:[[[[0.967468917 1.06118989 -0.163981736 1.08074331 1.17099369 0.442800313 1.17799556 2.27933836 -2.60678172 2.21583128 0.227930516 2.86299253 0.242483735 1.74511826 -0.134079367 0.305308133 1.95255935 1.07803977 0.752312422 1.77685463 0.879820466 2.54510021 0.332811713 1.28024757 1.20517981 1.3358978 2.65643 1.82159865 1.97122025 2.68817329 1.06775188 0.532033622 1.2451719 1.46966672 1.76541471 0.714564264 1.77331233 2.12195897 1.73540568 1.81635606 2.28362513 0.819033444 1.9051981 1.50027204 -2.04698372 0.200355738 1.83441603 0.448628902 -3.1080389 1.7960788 0.68008852 1.03784168 1.7438097 2.62975478 2.4062469

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

2022-05-24 17:41:57 [INFO] Start sampling on calibration dataset.
;EfficientNet/stem_conv/Conv2D_eightbit_min_EfficientNet/stem_conv_pad/Pad__print__;__min:[0]
;EfficientNet/stem_conv/Conv2D_eightbit_max_EfficientNet/stem_conv_pad/Pad__print__;__max:[1]
;EfficientNet/stem_conv/Conv2D_eightbit_requant_range__print__;__requant_min:[-16.6541328]
;EfficientNet/stem_conv/Conv2D_eightbit_requant_range__print__;__requant_max:[21.5224571]
;EfficientNet/block1a_se_reduce/Conv2D_eightbit_min_EfficientNet/block1a_se_reshape/Reshape__print__;__min:[-0.275394917]
;EfficientNet/block1a_se_reduce/Conv2D_eightbit_max_EfficientNet/block1a_se_reshape/Reshape__print__;__max:[15.0205

;EfficientNet/block5c_se_reduce/Conv2D_eightbit_min_EfficientNet/block5c_se_reshape/Reshape__print__;__min:[-0.27220574]
;EfficientNet/block5c_se_reduce/Conv2D_eightbit_max_EfficientNet/block5c_se_reshape/Reshape__print__;__max:[4.43657398]
;EfficientNet/block5c_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_min:[-2.50568628]
;EfficientNet/block5c_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_max:[5.40981197]
;EfficientNet/block5c_se_expand/Conv2D_eightbit_min_EfficientNet/block5c_se_reduce/mul__print__;__min:[-0.278452218]
;EfficientNet/block5c_se_expand/Conv2D_eightbit_max_EfficientNet/block5c_se_reduce/mul__print__;__max:[5.38572454]
;EfficientNet/block5c_se_expand/Conv2D_eightbit_requant_range__print__;__requant_min:[-4.71783686]
;EfficientNet/block5c_se_expand/Conv2D_eightbit_requant_range__print__;__requant_max:[2.41187549]
;EfficientNet/block5c_project_conv/Conv2D_eightbit_min_EfficientNet/block5c_se_excite/mul__print__;__min:[-0.255543441]
;EfficientNet

;EfficientNet/block2b_se_reduce/Conv2D_eightbit_min_EfficientNet/block2b_se_reshape/Reshape__print__;__min:[-0.278456897]
;EfficientNet/block2b_se_reduce/Conv2D_eightbit_max_EfficientNet/block2b_se_reshape/Reshape__print__;__max:[3.63701963]
;EfficientNet/block2b_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_min:[-0.0524663553]
;EfficientNet/block2b_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_max:[6.17646885]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_min_EfficientNet/block2b_se_reduce/mul__print__;__min:[-0.0255451556]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_max_EfficientNet/block2b_se_reduce/mul__print__;__max:[6.16366243]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_requant_range__print__;__requant_min:[-5.11612701]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_requant_range__print__;__requant_max:[5.82428598]
;EfficientNet/block2b_project_conv/Conv2D_eightbit_min_EfficientNet/block2b_se_excite/mul__print__;__min:[-0.276271164]
;Efficien

2022-05-24 17:41:59 [INFO] Pass QuantizedRNNConverter elapsed time: 20.27 ms
2022-05-24 17:42:00 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 46.63 ms
2022-05-24 17:42:00 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 18.29 ms
2022-05-24 17:42:00 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 14.81 ms
2022-05-24 17:42:00 [INFO] Pass MetaOpOptimizer elapsed time: 8.6 ms
2022-05-24 17:42:00 [INFO] Pass PostCseOptimizer elapsed time: 132.29 ms
2022-05-24 17:42:00 [INFO] |**********Mixed Precision Statistics*********|
2022-05-24 17:42:00 [INFO] +-----------------------+-------+------+------+
2022-05-24 17:42:00 [INFO] |        Op Type        | Total | INT8 | FP32 |
2022-05-24 17:42:00 [INFO] +-----------------------+-------+------+------+
2022-05-24 17:42:00 [INFO] |         Conv2D        |   65  |  65  |  0   |
2022-05-24 17:42:00 [INFO] | DepthwiseConv2dNative |   16  |  0   |  16  |
2022-05-24 17:42:00 [INFO] |         MatMul        |   1   |  1   |  0   |
2022-05-

In [8]:
images = np.random.random((len(ds_test), 224, 224, 3))

tcmalloc: large alloc 14450688000 bytes == 0x55670bb92000 @  0x7f239fc97d3f 0x7f239fcce0c0 0x7f239fcce61a 0x7f2394fa1ead 0x7f2394fa1f27 0x7f2394fe4794 0x7f2394fe531f 0x7f239507ac25 0x7f2394b120bc 0x7f2394bc4fc8 0x5564e5f2adb5 0x7f2394bc4c71 0x5564e5f67fc4 0x5564e5f680e1 0x5564e5fc41cc 0x5564e5f05059 0x5564e5f05f24 0x5564e5f05f4c 0x5564e5fcef7d 0x5564e5f67e49 0x5564e5f680e1 0x5564e5fc3b14 0x5564e5f61092 0x5564e5fc0dda 0x5564e5f61092 0x5564e5fc0dda 0x5564e5f61092 0x5564e5f67dec 0x5564e5f6814f 0x5564e5fc3fa9 0x5564e5f6707b


In [9]:
start = time()
with model_quantized.sess as sess:
    sess.run(model_quantized.output_tensor,
             feed_dict={model_quantized.input_tensor[0]: images})
infer_time_model_quantized = time() - start

tcmalloc: large alloc 7225344000 bytes == 0x556a698d2000 @  0x7f239fc97d3f 0x7f239fcce0c0 0x7f239fcce61a 0x7f2394fa1ead 0x7f2394fa1f27 0x7f2394fe4794 0x7f2394fe4aca 0x7f2394fe4cd3 0x7f2394fe7f10 0x7f2394fe812f 0x7f239507ffa7 0x5564e5f67f7c 0x5564e5f680e1 0x5564e5fc4782 0x5564e5f6707b 0x5564e5fbf8b0 0x5564e5f05059 0x5564e5f67307 0x5564e5fc0841 0x5564e5f05059 0x5564e5f05f24 0x5564e5f05f4c 0x5564e5fcef7d 0x5564e5f67e49 0x5564e5f680e1 0x5564e5fc3b14 0x5564e5f61092 0x5564e5fc0dda 0x5564e5f61092 0x5564e5fc0dda 0x5564e5f61092
tcmalloc: large alloc 8589934592 bytes == 0x556c18b72000 @  0x7f239fc97d3f 0x7f239fcce0c0 0x7f239fcd1082 0x7f239fcd1243 0x7f2382062402 0x7f23763fceb0 0x7f237641ca93 0x7f23764209ea 0x7f2376420f69 0x7f23764212d1 0x7f2376415ce3 0x7f2371adb051 0x7f237193638d 0x7f23716c9087 0x7f23716c991e 0x7f23716c9b1d 0x7f23789a2bf7 0x7f237a8c83ee 0x7f2371adcd7c 0x7f2371a66cec 0x7f2376e2976e 0x7f2376e261f3 0x7f23721b8313 0x7f239fc2e609 0x7f239fb53163
tcmalloc: large alloc 8589934592 bytes =

In [11]:
template = """
|    Precision   | Inference Time(s) |
|      FP32      |       {:5.2f}       |
|      INT8      |       {:5.2f}       |
| Improvement(%) |       {:5.2f}       |
"""
summary = template.format(
    inference_time_model_basic,
    infer_time_model_quantized,
    (1 - infer_time_model_quantized /inference_time_model_basic) * 100
)
print(summary)


|    Precision   | Inference Time(s) |
|      FP32      |       41.29       |
|      INT8      |       38.03       |
| Improvement(%) |        7.89       |

