## BigDL-Nano Keras Inference Example
---
This example shows the usage of BigDL-Nano Tensorflow Keras inference pipeline.

In [1]:
import os
from time import time

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from tensorflow.keras import layers
from bigdl.nano.tf.keras import Model

In [2]:
IMG_SIZE=224
BATCH_SIZE=64
DATASET_NAME="stanford_dogs"

### Loading data
---
Here we load data from tensorflow_datasets (hereafter TFDS). Stanford Dogs dataset is provided in TFDS as stanford_dogs. It features 20,580 images that belong to 120 classes of dog breeds (12,000 for training and 8,580 for testing).

In [3]:
(_, ds_test), ds_info=tfds.load(
    DATASET_NAME, data_dir="tensorflow_datasets/" ,split=["train", "train"], with_info=True, as_supervised=True
)
NUM_CLASSES = ds_info.features["label"].num_classes

size = (IMG_SIZE, IMG_SIZE)
ds_test = ds_test.map(lambda image, label: (tf.image.resize(image, size), label))

2022-06-07 08:30:34.765214: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [4]:
def input_preprocess(image, label):
    label = tf.one_hot(label, NUM_CLASSES)
    return image, label

ds_test = ds_test.map(input_preprocess)
TEST_STEPS = int(os.environ.get('TEST_STEPS', len(ds_test)))
ds_test = ds_test.take(TEST_STEPS)

### Loading Model
---
Load the EfficientNetB0 Model using the H5 file saving using bigdl.nano.tf.keras.Model after single process training in the nano-keras-fit-example.

In [5]:
model = keras.models.load_model("EfficientNetB0.h5")

In [6]:
start = time()
model.predict(ds_test.batch(64))
inference_time_model_basic = time() - start

tcmalloc: large alloc 1073741824 bytes == 0x55b28622c000 @  0x7f7e60cc8d3f 0x7f7e60cff0c0 0x7f7e60d02082 0x7f7e60d02243 0x7f7e43092402 0x7f7e3742ceb0 0x7f7e3744d0b5 0x7f7e374509ea 0x7f7e37450f69 0x7f7e374512d1 0x7f7e37445ce3 0x7f7e32b0b051 0x7f7e3296638d 0x7f7e326f9087 0x7f7e326f991e 0x7f7e326f9b1d 0x7f7e3e0eabf5 0x7f7e32b0cd7c 0x7f7e32a96cec 0x7f7e37e5976e 0x7f7e37e561f3 0x7f7e331e8313 0x7f7e60c5e609 0x7f7e60b83133


### Quantize Model
---
Use Model.quantize from bigdl.nano.tf.keras to calibrate a keras model for post-training quantization.<br>
Here are the parameters used in the notebook:
```
    :param calib_dataset:  A tf.data.Dataset object for calibration. Required for
                            static quantization.
    :param val_dataset:    A tf.data.Dataset object for evaluation.
    :param batch:          Batch size of dataloader for both calib_dataset and val_dataset.
    :param metric:         A Metric object for evaluation.
    
    :param tuning_strategy:    'bayesian', 'basic', 'mse', 'sigopt'. Default: 'bayesian'.
    
```
Access more details from [Source](https://github.com/intel-analytics/BigDL/blob/main/python/nano/src/bigdl/nano/tf/quantization.py#L22)

In [7]:
model_quantized = model.quantize(calib_dataset=ds_test.batch(64),
                                 metric=tf.keras.metrics.CategoricalAccuracy(),
                                 tuning_strategy='bayesian'
                                 )

2022-06-07 04:23:07.861366: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-06-07 04:23:07.861591: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-07 04:23:07.902605: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.016ms.
  function_optimizer: function_optimizer did nothing. time = 0.003ms.

2022-06-07 04:23:09.256227: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-06-07 04:23:09.256405: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-07 04:23:09.426666: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:114

Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-06-07 04:23:11 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 464.52 ms
2022-06-07 04:23:11 [INFO] Pass GraphCseOptimizer elapsed time: 120.49 ms
2022-06-07 04:23:12 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 783.94 ms
2022-06-07 04:23:12 [INFO] Pass UpdateEnterOptimizer elapsed time: 28.49 ms
2022-06-07 04:23:12 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 32.47 ms
2022-06-07 04:23:12 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 32.76 ms
2022-06-07 04:23:12 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 33.09 ms
2022-06-07 04:23:12 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 32.58 ms
2022-06-07 04:23:12 [INFO] Pass ExpandDimsOptimizer elapsed time: 32.53 ms
2022-06-07 04:23:12 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 32.9 ms
2022-06-07 04:23:13 [INFO] Pass MoveSqueezeAfterReluOptimizer elapsed time: 31.81 ms
2022-06-07 04:23:13 [INFO] Pass Pre 

Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


INFO:tensorflow:No assets to save.


2022-06-07 04:24:14 [INFO] No assets to save.


INFO:tensorflow:No assets to write.


2022-06-07 04:24:14 [INFO] No assets to write.


INFO:tensorflow:SavedModel written to: /home/projects/BigDL/python/nano/notebooks/tensorflow/stanford_dogs/nc_workspace/2022-06-07_04-23-06/fp32_logged_graph/saved_model.pb


2022-06-07 04:24:14 [INFO] SavedModel written to: /home/projects/BigDL/python/nano/notebooks/tensorflow/stanford_dogs/nc_workspace/2022-06-07_04-23-06/fp32_logged_graph/saved_model.pb
2022-06-07 04:24:14 [INFO] Save quantized model to /home/projects/BigDL/python/nano/notebooks/tensorflow/stanford_dogs/nc_workspace/2022-06-07_04-23-06/fp32_logged_graph.










Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2022-06-07 04:24:14 [INFO] Saver not created because there are no variables in the graph to restore


INFO:tensorflow:The specified SavedModel has no variables; no checkpoints were restored.


2022-06-07 04:24:14 [INFO] The specified SavedModel has no variables; no checkpoints were restored.
2022-06-07 04:24:14 [INFO] Start sampling on calibration dataset.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

6 -1.09688187 -0.616288185 -0.0822677612 -1.86062241 1.65286446 0.554042816 -0.420948029 1.24310303][0.111104965 2.86409283 -0.156423569 -1.700737 0.380524635 -2.70877838 -0.516951561 0.0402984619 -0.690979 -1.68783712 -0.0647420883 -1.27616501 1.82960606 -1.30992985 -0.909689903 1.42675495][0.945001602 2.8702898 -0.281414032 -1.27297592 -0.153067589 -2.64780807 -0.562912 0.185821533 -0.866857529 0.649182796 -0.258250713 -1.48960876 1.59622 0.242272854 -0.75803566 0.506381035][0.0592746735 2.62710381 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

;EfficientNet/block4b_se_reduce/BiasAdd__print__;__KL:[[[[0.995156288 -0.23455216 2.72230506 1.74978304 1.42368829 0.9933815 3.16343665 2.17851853 2.11126399 4.92242479 -0.0494797304 0.0540683642 2.98959351 1.06583166 0.427687 3.48294806 1.64894235 1.71413434 3.14434433 3.45542097]]][[[1.37665105 0.748962522 1.83821833 1.54843318 3.21531653 0.853180766 3.68292308 1.89316511 2.28953218 4.89517879 -1.41290367 0.835762501 3.25347209 1.78371716 1.39189887 2.84778547 3.18422055 2.35034013 1.98875785 2.97447133]]][[[2.17791629 0.645867288 1.20781064 1.17795372 2.36260724 0.947118878 3.31582713 3.70996857 1.38561571 5.23891783 -1.36443746 -0.225725427 3.52530932 1.217239

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

;EfficientNet/block5b_se_expand/BiasAdd__print__;__KL:[[[[0.043577224 0.276869506 0.111539215 -2.12991595 -2.20253229 -0.646009147 -0.0925698 -0.352288067 -0.906201184 -0.125935227 -0.0693349615 0.370096534 -2.37758064 -0.100445345 -1.47993171 -1.10673857 -0.265054226 0.0435506813 -1.4440856 -0.00720765907 -1.0398314 -1.16488922 0.68871361 -3.60427713 -1.28818309 1.4494319 -0.77619344 -0.0354833715 -0.0481501594 -0.275038809 -0.655816376 0.858021 -1.4013648 -2.46051 -0.185950145 -0.0882124752 -0.758475423 -0.563088894 -1.25527167 -0.592543542 -1.8535099 -1.53536892 -0.257168531 1.19143772 -2.18824887 -1.65824604 -1.18444419 -0.426623881 -0.560840547 0.00964555331 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

;EfficientNet/block5c_se_expand/BiasAdd__print__;__KL:[[[[-2.73600554 -0.321487367 -0.70625484 -0.429918736 -0.282983333 -0.660471499 -0.243861094 -0.351775587 -0.593045771 -2.73209119 -1.62508106 -0.858894706 -1.09293675 -1.0396198 -0.371305794 -1.15430319 -2.28955054 0.394709647 -2.08604646 -0.585074723 -0.404741526 -0.83419621 -0.920102954 -0.330669671 -0.713725 -1.68443549 -0.529329896 -0.672015309 -3.09922171 0.131374612 -0.15021278 -2.00217199 0.057498619 -0.208153069 -0.741317868 -1.54794502 -1.04513049 0.254311353 -1.22248316 -2.21628451 -1.56706917 0.864998579 -0.775503695 -0.374890119 -1.26918733 -1.21885753 0.869731724 -0.743793249 -0.949761331 -0.03672

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

06 1.16604018 -0.393732756 0.608234227 2.56603479 -0.13202922 2.13033056 0.0182729904 -1.96944916 -0.824772358 0.980933547 0.717516363 0.337967604 1.00254607 0.523405135 1.62807178 -0.685420811 1.48907018 2.43421125 -0.920809865 0.137145922 1.5794009 1.49694991 0.55171 -0.333858073 0.48080951 2.73443031 0.754579425 -1.32623792 2.12253284 0.172215775 2.22534347 0.429344535 3.59464478 -0.0412164405 1.53132463 2.5818 -0.651043057 -1.32109964 -1.26355612 -0.604933441 1.53056896 1.01569021]]][[[-3.03665304 2.72969508 -4.5502944 1.44467413 5.45282698 -2.95272326 2.53502178 1.11748242 -0.273912549 3.306458 0.292781293 3.67755532 -2.95396948 -5.22428465 0.495607 2.2805845

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

 -2.25904512 -2.52339649 -0.329480648 -0.921312094 -3.20745683 -1.53407359 -2.88393164 -0.920381546 -1.76087582 -3.53810358 -3.23206306 -1.36140394 1.12078786 -2.33845758 0.883744955 -1.38437438 -1.18397939 -4.29397964 3.3362329 -1.31663752 -1.9810611 -0.620040655 -4.59684896 0.694036245 1.07939231 -1.72338772 1.79178536 -2.36944103 -1.76575375 0.563286901 -3.49270582 0.23149848 -2.75253177 -0.678557217 -3.64825 -2.01377606 -2.36523294 -0.675490379 -1.65436399 -1.21261716 -3.16678715 2.01708484 0.33205843 -0.308283687 -2.98061395 -1.52169156 -4.54607964 -2.56634665 2.93084192 -4.22359085 -2.34777355 -0.914489508 -4.46421623 -1.26459062 -0.6282444 0.152509212 -1.42

2022-06-07 04:29:29 [INFO] Start sampling on calibration dataset.
;EfficientNet/stem_conv/Conv2D_eightbit_min_EfficientNet/stem_conv_pad/Pad__print__;__min:[0]
;EfficientNet/stem_conv/Conv2D_eightbit_max_EfficientNet/stem_conv_pad/Pad__print__;__max:[1]
;EfficientNet/stem_conv/Conv2D_eightbit_requant_range__print__;__requant_min:[-16.6541328]
;EfficientNet/stem_conv/Conv2D_eightbit_requant_range__print__;__requant_max:[21.5224571]
;EfficientNet/block1a_se_reduce/Conv2D_eightbit_min_EfficientNet/block1a_se_reshape/Reshape__print__;__min:[-0.275394917]
;EfficientNet/block1a_se_reduce/Conv2D_eightbit_max_EfficientNet/block1a_se_reshape/Reshape__print__;__max:[15.0205612]
;EfficientNet/block1a_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_min:[-4.10535812]
;EfficientNet/block1a_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_max:[-1.48294878]
;EfficientNet/block1a_se_expand/Conv2D_eightbit_min_EfficientNet/block1a_se_reduce/mul__print__;__min:[-0.27431944]
;Efficien

;EfficientNet/block5c_project_conv/Conv2D_eightbit_min_EfficientNet/block5c_se_excite/mul__print__;__min:[-0.255543441]
;EfficientNet/block5c_project_conv/Conv2D_eightbit_max_EfficientNet/block5c_se_excite/mul__print__;__max:[12.9150925]
;EfficientNet/block5c_project_conv/Conv2D_eightbit_requant_range__print__;__requant_min:[-19.5194569]
;EfficientNet/block5c_project_conv/Conv2D_eightbit_requant_range__print__;__requant_max:[20.3025894]
;EfficientNet/block6a_expand_conv/Conv2D_eightbit_min_EfficientNet/block5c_add/add__print__;__min:[-38.8998795]
;EfficientNet/block6a_expand_conv/Conv2D_eightbit_max_EfficientNet/block5c_add/add__print__;__max:[39.5121269]
;EfficientNet/block6a_expand_conv/Conv2D_eightbit_requant_range__print__;__requant_min:[-11.5431557]
;EfficientNet/block6a_expand_conv/Conv2D_eightbit_requant_range__print__;__requant_max:[13.6583843]
;EfficientNet/block6a_se_reduce/Conv2D_eightbit_min_EfficientNet/block6a_se_reshape/Reshape__print__;__min:[-0.271459937]
;EfficientNet

;EfficientNet/block2b_se_reduce/Conv2D_eightbit_min_EfficientNet/block2b_se_reshape/Reshape__print__;__min:[-0.278456897]
;EfficientNet/block2b_se_reduce/Conv2D_eightbit_max_EfficientNet/block2b_se_reshape/Reshape__print__;__max:[3.63701963]
;EfficientNet/block2b_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_min:[-0.0524663553]
;EfficientNet/block2b_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_max:[6.17646885]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_min_EfficientNet/block2b_se_reduce/mul__print__;__min:[-0.0255451556]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_max_EfficientNet/block2b_se_reduce/mul__print__;__max:[6.16366243]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_requant_range__print__;__requant_min:[-5.11612701]
;EfficientNet/block2b_se_expand/Conv2D_eightbit_requant_range__print__;__requant_max:[5.82428598]
;EfficientNet/block2b_project_conv/Conv2D_eightbit_min_EfficientNet/block2b_se_excite/mul__print__;__min:[-0.276271164]
;Efficien

2022-06-07 04:29:31 [INFO] Pass QuantizedRNNConverter elapsed time: 19.57 ms
2022-06-07 04:29:32 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 41.59 ms
2022-06-07 04:29:32 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 15.77 ms
2022-06-07 04:29:32 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 13.76 ms
2022-06-07 04:29:32 [INFO] Pass MetaOpOptimizer elapsed time: 8.11 ms
2022-06-07 04:29:32 [INFO] Pass PostCseOptimizer elapsed time: 129.13 ms
2022-06-07 04:29:32 [INFO] |**********Mixed Precision Statistics*********|
2022-06-07 04:29:32 [INFO] +-----------------------+-------+------+------+
2022-06-07 04:29:32 [INFO] |        Op Type        | Total | INT8 | FP32 |
2022-06-07 04:29:32 [INFO] +-----------------------+-------+------+------+
2022-06-07 04:29:32 [INFO] |         Conv2D        |   65  |  65  |  0   |
2022-06-07 04:29:32 [INFO] | DepthwiseConv2dNative |   16  |  0   |  16  |
2022-06-07 04:29:32 [INFO] |         MatMul        |   1   |  1   |  0   |
2022-06

In [8]:
images = np.random.random((len(ds_test), 224, 224, 3))

tcmalloc: large alloc 14489223168 bytes == 0x55b4817ae000 @  0x7f7e60cc8d3f 0x7f7e60cff0c0 0x7f7e60cff61a 0x7f7e55fd1ead 0x7f7e55fd1f27 0x7f7e56014794 0x7f7e5601531f 0x7f7e560aac25 0x7f7e55b420bc 0x7f7e55bf4fc8 0x55b236473db5 0x7f7e55bf4c71 0x55b2364b0fc4 0x55b2364b10e1 0x55b23650d1cc 0x55b23644e059 0x55b23644ef24 0x55b23644ef4c 0x55b236517f7d 0x55b2364b0e49 0x55b2364b10e1 0x55b23650cb14 0x55b2364aa092 0x55b236509dda 0x55b2364aa092 0x55b236509dda 0x55b2364aa092 0x55b2364b0dec 0x55b2364b114f 0x55b23650cfa9 0x55b2364b007b


In [9]:
start = time()
with model_quantized.sess as sess:
    sess.run(model_quantized.output_tensor,
             feed_dict={model_quantized.input_tensor[0]: images})
infer_time_model_quantized = time() - start

tcmalloc: large alloc 7244611584 bytes == 0x55b7e19ae000 @  0x7f7e60cc8d3f 0x7f7e60cff0c0 0x7f7e60cff61a 0x7f7e55fd1ead 0x7f7e55fd1f27 0x7f7e56014794 0x7f7e56014aca 0x7f7e56014cd3 0x7f7e56017f10 0x7f7e5601812f 0x7f7e560affa7 0x55b2364b0f7c 0x55b2364b10e1 0x55b23650d782 0x55b2364b007b 0x55b2365088b0 0x55b23644e059 0x55b2364b0307 0x55b236509841 0x55b23644e059 0x55b23644ef24 0x55b23644ef4c 0x55b236517f7d 0x55b2364b0e49 0x55b2364b10e1 0x55b23650cb14 0x55b2364aa092 0x55b236509dda 0x55b2364aa092 0x55b236509dda 0x55b2364aa092
tcmalloc: large alloc 8589934592 bytes == 0x55b991eae000 @  0x7f7e60cc8d3f 0x7f7e60cff0c0 0x7f7e60d02082 0x7f7e60d02243 0x7f7e43092402 0x7f7e3742ceb0 0x7f7e3744ca93 0x7f7e374509ea 0x7f7e37450f69 0x7f7e374512d1 0x7f7e37445ce3 0x7f7e32b0b051 0x7f7e3296638d 0x7f7e326f9087 0x7f7e326f991e 0x7f7e326f9b1d 0x7f7e399d2bf7 0x7f7e3b8f83ee 0x7f7e32b0cd7c 0x7f7e32a96cec 0x7f7e37e5976e 0x7f7e37e561f3 0x7f7e331e8313 0x7f7e60c5e609 0x7f7e60b83133
tcmalloc: large alloc 8589934592 bytes =

In [10]:
template = """
|    Precision   | Inference Time(s) |
|      FP32      |       {:5.2f}       |
|      INT8      |       {:5.2f}       |
| Improvement(%) |       {:5.2f}       |
"""
summary = template.format(
    inference_time_model_basic,
    infer_time_model_quantized,
    (1 - infer_time_model_quantized /inference_time_model_basic) * 100
)
print(summary)


|    Precision   | Inference Time(s) |
|      FP32      |       43.63       |
|      INT8      |       38.53       |
| Improvement(%) |       11.69       |

