[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/inference_optimizer_optimize.ipynb)

# Find Acceleration Method with the Minimum Inference Latency using InferenceOptimizer

This example illustrates how to apply InferenceOptimizer to quickly find acceleration method with the minimum inference latency under specific restrictions or without restrictions for a trained model. 
In this example, we first train ResNet18 model on the [cats and dogs dataset](https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip). Then, by calling `optimize()`, we can obtain all available accelaration combinations provided by BigDL-Nano for inference. By calling `get_best_model()` , we could get the best model under specific restrictions or without restrictions.

To inference using Bigdl-nano InferenceOptimizer, the following packages need to be installed first. We recommend you to use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to prepare the environment and install the following packages in a conda environment. 

You can create a conda environment by executing:

```
# "nano" is conda environment name, you can use any name you like.
conda create -n nano python=3.7 setuptools=58.0.4  
conda activate nano
!pip install --pre --upgrade bigdl-nano[pytorch,inference]  # install the nightly-bulit version
```


Then initialize environment variables with script `bigdl-nano-init` installed with bigdl-nano.

In [None]:
!source bigdl-nano-init

First, prepare model and dataset. We use a pretrained ResNet18 model and train the model on [cats and dogs dataset](https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip) in this example.

In [None]:
import torch
from pathlib import Path
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.datasets.utils import download_and_extract_archive
from torch.utils.data import Subset, DataLoader
from bigdl.nano.pytorch import Trainer

def accuracy(pred, target):
    pred = torch.sigmoid(pred)
    return Accuracy()(pred, target)

def prepare_model_and_dataset(model_ft, val_size):
    DATA_URL = "https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"

    train_transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    val_transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    if not Path("data").exists():
        # download dataset
        download_and_extract_archive(url=DATA_URL, download_root="data", remove_finished=True)

    data_path = Path("data/cats_and_dogs_filtered")
    train_dataset = ImageFolder(data_path.joinpath("train"), transform=train_transform)
    val_dataset = ImageFolder(data_path.joinpath("validation"), transform=val_transform)

    indices = torch.randperm(len(val_dataset))
    val_dataset = Subset(val_dataset, indices=indices[:val_size])

    train_dataloader = DataLoader(dataset=train_dataset, batch_size=8, shuffle=True)
    val_dataloader = DataLoader(dataset=val_dataset, batch_size=8, shuffle=False)

    num_ftrs = model_ft.fc.in_features
    
    model_ft.fc = torch.nn.Linear(num_ftrs, 2)
    loss_ft = torch.nn.CrossEntropyLoss()
    optimizer_ft = torch.optim.Adam(model_ft.parameters(), lr=1e-3)

    # compile model
    model = Trainer.compile(model_ft, loss=loss_ft, optimizer=optimizer_ft, metrics=[accuracy])
    trainer = Trainer(max_epochs=1)
    trainer.fit(model, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader)
    
    return model, train_dataset, val_dataset


In [None]:
from torchvision.models import resnet18

model = resnet18(pretrained=True)
_, train_dataset, val_dataset = prepare_model_and_dataset(model, val_size=500)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_The full definition of function_ `prepare_model_and_dataset` _could be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/inference_optimizer_optimize.ipynb).

To find acceleration method with the minimum inference latency, you could import `InferenceOptimizer` and call `optimize` method. The `optimize` method will run all possible acceleration combinations and output the result, it will take about 2 minutes.

In [None]:
from bigdl.nano.pytorch import InferenceOptimizer
from torch.utils.data import DataLoader

# Define metric for accuracy calculation
def accuracy(pred, target):
    pred = torch.sigmoid(pred)
    return Accuracy()(pred, target)

optimizer = InferenceOptimizer()

# To obtain the latency of single sample, set batch_size=1
train_dataloader = DataLoader(train_dataset, batch_size=1)
val_dataloader = DataLoader(val_dataset)

optimizer.optimize(model=model,
                   training_data=train_dataloader,
                   validation_data=val_dataloader,
                   metric=accuracy,
                   direction="max",
                   thread_num=1,
                   latency_sample_num=30)

The example output of `optimizer.optimize` is shown below.

```
==========================Optimization Results==========================
 -------------------------------- ---------------------- -------------- ----------------------
|             method             |        status        | latency(ms)  |       accuracy       |
 -------------------------------- ---------------------- -------------- ----------------------
|            original            |      successful      |    41.304    |         0.86         |
|           fp32_ipex            |      successful      |    38.624    |    not recomputed    |
|              bf16              |   lack dependency    |     None     |         None         |
|           bf16_ipex            |   lack dependency    |     None     |         None         |
|              int8              |      successful      |    23.108    |        0.852         |
|            jit_fp32            |    early stopped     |    75.324    |         None         |
|         jit_fp32_ipex          |      successful      |    65.829    |    not recomputed    |
|  jit_fp32_ipex_channels_last   |    early stopped     |    90.795    |         None         |
|         openvino_fp32          |      successful      |    40.322    |    not recomputed    |
|         openvino_int8          |      successful      |    3.871     |        0.834         |
|        onnxruntime_fp32        |      successful      |    30.08     |    not recomputed    |
|    onnxruntime_int8_qlinear    |      successful      |    18.662    |        0.846         |
|    onnxruntime_int8_integer    |   fail to convert    |     None     |         None         |
 -------------------------------- ---------------------- -------------- ----------------------
Optimization cost 74.2s in total.
```

> 📝 **Note**
> 
> When specifying `training_data` parameter, make sure to set batch size of the training data to the same batch size you may want to use in real deploy environment, as the batch size may impact on latency.
>
> For more information, please refer to the [API Documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl.nano.pytorch.InferenceOptimizer).

You could call `get_best_model` method to obtain the best model under specific restrictions or without restrictions. Here we get the model with minimal latency when accuracy drop less than 5%.

In [None]:
acc_model, option = optimizer.get_best_model(accuracy_criterion=0.05)
print("When accuracy drop less than 5%, the model with minimal latency is: ", option)

Then you could use the best model for inference. 

In [None]:
with InferenceOptimizer.get_context(acc_model):
    x = next(iter(train_dataloader))[0]
    output = acc_model(x)

To export the best model, you could simply call `save` method and pass the path to it.

In [None]:
save_dir = "./best_model"
InferenceOptimizer.save(acc_model, save_dir)

The model files will be saved at `./best_model` directory. For each type in the `option` of best model, you only need to take the following files for further usage.

- **OpenVINO**
    
    `ov_saved_model.bin`: Contains the weights and biases binary data of model
    
    `ov_saved_model.xml`: Model checkpoint for general use, describes model structure

- **onnxruntime**

    `onnx_saved_model.onnx`: Represents model checkpoint for general use, describes model structure
    
- **int8**

    `best_model.pt`: Represents model optimized by Intel® Neural Compressor

- **ipex | channel_last | jit**
    
    `ckpt.pt`: If `jit` in option, it stores model optimized using just-in-time compilation, otherwise, it stores original model weight by `torch.save(model.state_dict())`.

- **Others**
    
    `saved_weight.pt`: Saved by `torch.save(model.state_dict())`.
    
    If `bf16` in option, the model weights obtained are bf16 dtype, otherwise, the model weights obtained are fp32 dtype

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/install_in_colab.html)