[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/accelerate_pytorch_inference_bf16.ipynb)

# Accelerate PyTorch Inference with BF16 mixed precision

> 📝 **Quick Note**
>
> * `bf16`: `InferenceOptimizer.quantize(model, precision='bf16')`.
> * `bf16 + ipex`: `InferenceOptimizer.quantize(model, precision='bf16', use_ipex=True)`
> * `bf16 + jit`: `InferenceOptimizer.quantize(model, precision='bf16', accelerator="jit")`
> * `bf16 + channels_last`: `InferenceOptimizer.quantize(model, precision='bf16', channels_last=True)`

To accelerate the model in bf16 precision, the following dependencies need to be installed first：

In [None]:
# for BigDL-Nano
!pip install --pre --upgrade bigdl-nano[pytorch]  # install the nightly-bulit version
# !source bigdl-nano-init


> 📝 **Note**
>
> We recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect.

Let's take an [ResNet-18 model](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html) pretrained on ImageNet dataset as an example. First, we load the model:

In [None]:
import torch
from torchvision.models import resnet18

model_ft = resnet18(pretrained=True)

Accelerate the model in bf16 precision, we need import `InferenceOptimizer`.

In [None]:
from bigdl.nano.pytorch import InferenceOptimizer

> 📝 **Note**
>
> Platforms without hardware acceleration for BFloat16 could lead to bad BFloat16 inference performance. In other word, only Cooper Lake and Sapphire Rapids Xeon processors could reveal the extreme performance.
>
> All of following methods could be combined as users' wish. For example, `BF16+IPEX+jit+channels_last` is also supported. Automatically searching for the best configurations could be found [here](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize.html)

### BF16

Users will have a model that could utilize the mixed precision instructions(e.g., AVX512_bf16, AMX_bf16) with the assistance of `with InferenceOptimizer.get_context(bf16_model):`.

In [None]:
x = torch.rand(2, 3, 224, 224)
bf16_model = InferenceOptimizer.quantize(model_ft,
                                         precision='bf16')
with InferenceOptimizer.get_context(bf16_model):
    y_hat = bf16_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

### BF16 + IPEX
Users will have a model that is optimized by Intel® Extension for PyTorch (within eager mode) and utilize the mixed precision instructions(e.g., AVX512_bf16, AMX_bf16) with the assistance of `with InferenceOptimizer.get_context(bf16_model):`.

In [None]:
ipex_model = InferenceOptimizer.quantize(model_ft,
                                         precision='bf16',
                                         use_ipex=True)
with InferenceOptimizer.get_context(ipex_model):
    y_hat = ipex_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

### BF16 + jit

Users will have a model that is traced by torch.jit and utilize the mixed precision instructions(e.g., AVX512_bf16, AMX_bf16) with the assistance of `with InferenceOptimizer.get_context(bf16_model):`.

In [None]:
jit_model = InferenceOptimizer.quantize(model_ft,
                                        precision='bf16',
                                        accelerator="jit",
                                        input_sample=torch.rand(1, 3, 224, 224))
with InferenceOptimizer.get_context(jit_model):
    y_hat = jit_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

> 📝 **Note**
>
> `input_sample` is the parameter for OpenVINO accelerator to know the **shape** of the model input. So both the batch size and the specific values are not important to `input_sample`. If we want our test dataset to consist of images with $224 \times 224$ pixels, we could use `torch.rand(1, 3, 224, 224)` for `input_sample` here.
>

### BF16 + channels_last
Users will have a model with alternative way of ordering NCHW and utilize the mixed precision instructions(e.g., AVX512_bf16, AMX_bf16) with the assistance of `with InferenceOptimizer.get_context(bf16_model):`.

In [None]:
channels_last_model = InferenceOptimizer.quantize(model_ft,
                                                  precision='bf16',
                                                  channels_last=True)
with InferenceOptimizer.get_context(channels_last_model):
    y_hat = channels_last_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

### Accelereate inference using combined method
You can use any of the methods mentioned above to combine each other to try to acclerate your model. However, the effect is not like stacking buffs.
It is not that the more methods you use, the better. You should try many times to find the best combination of methods.

In [None]:
# bf16 + IPEX + JIT + channels_last
jit_ipex_model = InferenceOptimizer.quantize(model_ft,
                                             precision='bf16',
                                             accelerator="jit",
                                             use_ipex=True,
                                             channels_last=True,
                                             input_sample=torch.rand(1, 3, 224, 224))
with InferenceOptimizer.get_context(jit_ipex_model):
    y_hat = jit_ipex_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

> 📚 **Related Readings**
>
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/install_in_colab.html)