[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/tensorflow/tensorflow_inference_optimizer_optimize.ipynb)

# Find Acceleration Method with the Minimum Inference Latency for TensorFlow model using InferenceOptimizer

This example illustrates how to apply InferenceOptimizer to quickly find acceleration method with the minimum inference latency for Tensorflow model under specific restrictions or without restrictions for a trained model. By calling `optimize()`, we can obtain all available accelaration combinations provided by BigDL-Nano for inference. By calling `get_best_model()` , we could get the best model under specific restrictions or without restrictions.

To inference using BigDL-Nano InferenceOptimizer, the following packages need to be installed first. We recommend you to use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to prepare the environment and install the following packages in a conda environment. 

You can create a conda environment by executing:

```bash
# "nano" is conda environment name, you can use any name you like.
conda create -n nano python=3.7 setuptools=58.0.4  
conda activate nano
pip install --pre --upgrade bigdl-nano[tensorflow,inference]
```


Then initialize environment variables with script `bigdl-nano-init` installed with bigdl-nano.

```bash
source bigdl-nano-init
```

First, prepare model and dataset. We use a pretrained [EfficientNetB0 model](https://www.tensorflow.org/api_docs/python/tf/keras/applications/efficientnet/EfficientNetB0) on Imagenet dataset and train the model on  on [Imagenette](https://www.tensorflow.org/datasets/catalog/imagenette) in this example.

In [1]:
from bigdl.nano.tf.keras import Model
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.applications import EfficientNetB0
from tqdm import tqdm

def prepare_dataset(img_size=224, batch_size=512):
    (ds_train, ds_test), ds_info = tfds.load(
        'imagenette/320px-v2',
        data_dir="./data/",
        split=['train', 'validation'],
        with_info=True,
        as_supervised=True
    )

    num_classes = ds_info.features['label'].num_classes

    def preprocessing(input_img, label):
        return tf.image.resize(input_img, (img_size, img_size)), tf.one_hot(label, num_classes)

    AUTOTUNE = tf.data.AUTOTUNE
    ds_train_batched = ds_train.shuffle(1000).map(preprocessing).batch(batch_size, drop_remainder=False).prefetch(AUTOTUNE)
    ds_test_batched = ds_test.map(preprocessing).batch(batch_size, drop_remainder=False).prefetch(AUTOTUNE)
    calib_set = ds_train.map(preprocessing).prefetch(AUTOTUNE)

    for img_t, lbl_t in tqdm(ds_test_batched):
        valid_data = tf.data.Dataset.from_tensor_slices((img_t, lbl_t))
        break

    return ds_train_batched, ds_test_batched, calib_set, valid_data, ds_info

def create_model(num_classes=10, img_size=224):
    inputs = layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    backbone = EfficientNetB0(weights='imagenet', include_top=False)
    backbone.trainable = False
    x = backbone(x)
    x = layers.GlobalAveragePooling2D('channels_last')(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
    return model

In [None]:
train_set, test_set, calibration_set, validation_set, ds_info = prepare_dataset()
ori_model = create_model()
ori_model.fit(train_set,
          epochs=5,
          steps_per_epoch=(ds_info.splits['train'].num_examples // 512 + 1),
          )

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_The full definition of function_ `prepare_dataset` _and_ `create_model` _ could be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/tensorflow/tensorflow_inference_optimizer_optimize.ipynb).

## Obtain available accelaration combinations by `optimize`

### 1. Default search mode
To find acceleration method with the minimum inference latency, you could import `InferenceOptimizer` and call `optimize` method. The `optimize` method will run all possible acceleration combinations and output the result.

In [None]:
from bigdl.nano.tf.keras import InferenceOptimizer

opt = InferenceOptimizer()
opt.optimize(ori_model,
             x=calibration_set,
             latency_sample_num=10)

The example output of `optimizer.optimize` is shown below.

```bash
==========================Optimization Results==========================
 -------------------------------- ---------------------- --------------
|             method             |        status        | latency(ms)  |
 -------------------------------- ---------------------- --------------
|            original            |      successful      |   110.198    |
|              int8              |      successful      |    55.621    |
|         openvino_fp32          |      successful      |    30.763    |
|         openvino_int8          |      successful      |    33.872    |
|        onnxruntime_fp32        |      successful      |    23.38     |
|    onnxruntime_int8_qlinear    |      successful      |    9.836     |
|    onnxruntime_int8_integer    |      successful      |    12.899    |
 -------------------------------- ---------------------- --------------
Optimization cost 347.9s in total.
```

### 2. Search with accuracy supervision
When calling `optimize`, to care about the possible accuracy drop, you could specify `validation_data`, `metric`, `direction` paramaters to enable validation:

In [None]:
from tensorflow.keras.metrics import CategoricalAccuracy

opt.optimize(ori_model,
             x=calibration_set,
             validation_data=validation_set,
             metric=CategoricalAccuracy(),
             direction="max",
             latency_sample_num=10)

The example output of `optimizer.optimize` is shown below.

```bash
==========================Optimization Results==========================
 -------------------------------- ---------------------- -------------- ----------------------
|             method             |        status        | latency(ms)  |     metric value     |
 -------------------------------- ---------------------- -------------- ----------------------
|            original            |      successful      |   106.692    |        0.996         |
|              int8              |      successful      |    55.652    |        0.996         |
|         openvino_fp32          |      successful      |    32.002    |        0.996*        |
|         openvino_int8          |      successful      |    33.648    |        0.995         |
|        onnxruntime_fp32        |      successful      |    25.639    |        0.996*        |
|    onnxruntime_int8_qlinear    |      successful      |    9.877     |        0.971         |
|    onnxruntime_int8_integer    |      successful      |     9.85     |        0.956         |
 -------------------------------- ---------------------- -------------- ----------------------
* means we assume the metric value of the traced model does not change, so we don't recompute metric value to save time.
Optimization cost 465.5s in total.
```

### 3. Filter acceleration methods
In some cases, you may just want to test or compare several specific methods, there are two ways to achieve this.

1. If you just want to test very little methods, you could just set `includes` parameter:

In [None]:
from tensorflow.keras.metrics import CategoricalAccuracy

opt.optimize(ori_model,
             x=calibration_set,
             validation_data=validation_set,
             metric=CategoricalAccuracy(),
             direction="max",
             includes=["openvino_fp32", "onnxruntime_fp32"],
             latency_sample_num=10)

The example output of `optimizer.optimize` is shown below.

```bash
==========================Optimization Results==========================
 -------------------------------- ---------------------- -------------- ----------------------
|             method             |        status        | latency(ms)  |     metric value     |
 -------------------------------- ---------------------- -------------- ----------------------
|            original            |      successful      |   108.209    |        0.994         |
|         openvino_fp32          |      successful      |    30.325    |        0.994*        |
|        onnxruntime_fp32        |      successful      |    31.313    |        0.994*        |
 -------------------------------- ---------------------- -------------- ----------------------
* means we assume the metric value of the traced model does not change, so we don't recompute metric value to save time.
Optimization cost 133.9s in total.
```

2. In some cases, if you expect that some acceleration methods will not work for your model / not work well / run for too long / cause exceptions to the program, you could avoid running these methods by specifying `excludes` paramater:

In [None]:
from tensorflow.keras.metrics import CategoricalAccuracy

opt.optimize(ori_model,
             x=calibration_set,
             validation_data=validation_set,
             metric=CategoricalAccuracy(),
             direction="max",
             excludes=["int8", "onnxruntime_int8_integer"],
             latency_sample_num=10)

The example output of `optimizer.optimize` is shown below.

```bash
==========================Optimization Results==========================
 -------------------------------- ---------------------- -------------- ----------------------
|             method             |        status        | latency(ms)  |     metric value     |
 -------------------------------- ---------------------- -------------- ----------------------
|            original            |      successful      |   110.496    |        0.994         |
|         openvino_fp32          |      successful      |    30.778    |        0.994*        |
|         openvino_int8          |      successful      |    38.152    |        0.994         |
|        onnxruntime_fp32        |      successful      |    23.143    |        0.994*        |
|    onnxruntime_int8_qlinear    |      successful      |    12.721    |        0.963         |
 -------------------------------- ---------------------- -------------- ----------------------
* means we assume the metric value of the traced model does not change, so we don't recompute metric value to save time.
Optimization cost 323.0s in total.
```

## Obtain specific model

You could call `get_best_model` method to obtain the best model under specific restrictions or without restrictions. Here we get the model with minimal latency when accuracy drop less than 5%.

In [5]:
from tensorflow.keras.metrics import CategoricalAccuracy

opt.optimize(ori_model,
             x=calibration_set,
             validation_data=validation_set,
             metric=CategoricalAccuracy(),
             direction="max",
             latency_sample_num=10)

acc_model, option = opt.get_best_model(accuracy_criterion=0.05)
print("When accuracy drop less than 5%, the model with minimal latency is: ", option)

When accuracy drop less than 5%, the model with minimal latency is:  inc + onnxruntime + integer


> 📝 **Note**
> 
>  If you want to find the best model with `accuracy_criterion` paramter, make sure you have called `optimize` with validation data.

If you just want to obtain a specific model although it doesn't have the minimal latency, you could call `get_model` method and specify `method_name`. Here we take `openvino_fp32` as an example:

In [None]:
oepnvino_model = opt.get_model(method_name='openvino_fp32')

## Inference

Then you could use the obtained model for inference. 

In [6]:
for img, _ in tqdm(test_set):
    acc_model(img)

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/install_in_colab.html)