[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/openvino/accelerate_inference_openvino_gpu.ipynb)

# Accelerate Inference on Intel GPUs Using OpenVINO

You can use `InferenceOptimizer.trace(..., accelerator='openvino', device='GPU')` to enable the OpenVINO acceleration for inference on Intel GPUs, both integrated and discrete ones. BigDL-Nano also supports quantization with OpenVINO accelerator on Intel GPUs by `InferenceOptimizer.quantize(..., accelerator='openvino', device='GPU', precision='fp16'/'int8')`. It only takes a few lines.

To apply OpenVINO acceleration, you need to install BigDL-Nano first:

In [None]:
# for pytorch users
!pip install --pre --upgrade bigdl-nano[pytorch,inference] # install the nightly-built version

!source bigdl-nano-init

In [None]:
# for tensorflow users
!pip install --pre --upgrade bigdl-nano[tensorflow,inference] # install the nightly-built version

!source bigdl-nano-init

> 📝 **Note**
> 
> We recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect.

> ⚠️ **Warning**
>
> Errors may occur when using `InferenceOptimizer.trace(..., accelerator='openvino')` API in CentOS, becuase the latest version of `openvino-dev` is not supported in CentOS.

Before starting this guide, below codes can help you search the available Intel GPU devices on you machine, and you can inference on any one of them.

In [None]:
from openvino.runtime import Core
core = Core()
print(core.available_devices)

The function returns a list of available devices:

|output|corresponding GPU device(s)|
|-|-|
|`GPU`|alias for `GPU.0`, integrated GPU|
|`GPU.X`|enumeration of GPUs, `X` - id of the GPU device|
|`GPU.X.Y`|specific tile in a multi-tile architecture, `X` - id of the GPU device, `Y` - id of the tile within device `X`|

For more information around the device naming convention of openvino, you can refer to this [page](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html#device-naming-convention).

## PyTorch example

Let's take a [ResNet-18 model](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html) pretrained on ImageNet dataset as an example. Note that you don't have to transfer the model to GPU and set it to evaluation mode since `InferenceOptimizer` will handle these automatically.

In [None]:
# Define the finetune function
import torch
from torchvision import transforms
from torchvision.datasets import OxfordIIITPet
from torch.utils.data.dataloader import DataLoader
from torchvision.models import resnet18
from bigdl.nano.pytorch import Trainer
from torchmetrics.classification import MulticlassAccuracy


def finetune_pet_dataset(model_ft):

    train_transform = transforms.Compose([transforms.Resize(256),
                                          transforms.RandomCrop(224),
                                          transforms.RandomHorizontalFlip(),
                                          transforms.ColorJitter(brightness=.5, hue=.3),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406],
                                                               [0.229, 0.224, 0.225])])
    val_transform = transforms.Compose([transforms.Resize(256),
                                        transforms.CenterCrop(224),
                                        transforms.ToTensor(),
                                        transforms.Normalize([0.485, 0.456, 0.406],
                                                             [0.229, 0.224, 0.225])])

    # apply data augmentation to the tarin_dataset
    train_dataset = OxfordIIITPet(root="/tmp/data",
                                  transform=train_transform,
                                  download=True)
    val_dataset = OxfordIIITPet(root="/tmp/data",
                                transform=val_transform)

    # obtain training indices that will be used for validation
    indices = torch.randperm(len(train_dataset))
    val_size = len(train_dataset) // 4
    train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
    val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])

    # prepare data loaders
    train_dataloader = DataLoader(train_dataset, batch_size=32)
    val_dataloader = DataLoader(val_dataset, batch_size=32)

    num_ftrs = model_ft.fc.in_features

    # here the size of each output sample is set to 37.
    model_ft.fc = torch.nn.Linear(num_ftrs, 37)
    loss_ft = torch.nn.CrossEntropyLoss()
    optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)

    # compile our model with loss function, optimizer.
    model = Trainer.compile(model_ft, loss_ft, optimizer_ft, metrics=[MulticlassAccuracy(num_classes=37)])
    trainer = Trainer(max_epochs=1)
    trainer.fit(model, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader)

    return model, train_dataset, val_dataset

In [None]:
from torchvision.models import resnet18

pt_model = resnet18(pretrained=True)
_, train_dataset, val_dataset = finetune_pet_dataset(pt_model)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_The full definition of function_ `finetune_pet_dataset` _could be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/openvino/accelerate_inference_openvino_gpu.ipynb).

To enable OpenVINO acceleration for your PyTorch inference pipeline on Intel GPUs, **the only change you need to made is to import BigDL-Nano** `InferenceOptimizer`**, and trace your PyTorch model to convert it into an OpenVINO accelerated module for inference, with specifying** `device='GPU'`.

> 📝 **Note**
> 
> By setting `device` to `'GPU'`, inference will be conducted on the default Intel GPU device. You can change to other devices (`'GPU.X'` / `'GPU.X.Y'`) instead.

In [None]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer

ov_model = InferenceOptimizer.trace(pt_model,
                                    accelerator="openvino",
                                    input_sample=torch.rand(1, 3, 224, 224),
                                    device='GPU')

> 📝 **Note**
> 
> `input_sample` is the parameter for OpenVINO accelerator to know the **shape** of the model input. So both the batch size and the specific values are not important to `input_sample`. If we want our test dataset consists of images with $224 \times 224$ pixels, we could use `torch.rand(1, 3, 224, 224)` for `input_sample` here.
> 
> Please refer to [API documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl.nano.pytorch.InferenceOptimizer.trace) for more information on `InferenceOptimizer.trace`.

If you want to quantize your model by using OpenVINO Post-training Optimization Tools, you could call `InferenceOptimizer.quantize`.

* For FP16 quantization:

In [None]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer

ov_model = InferenceOptimizer.quantize(pt_model,
                                       accelerator='openvino',
                                       input_sample=torch.rand(1, 3, 224, 224),
                                       device='GPU',
                                       precision='fp16')

* For INT8 quantization:

In [None]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer

ov_model = InferenceOptimizer.quantize(pt_model,
                                       accelerator='openvino',
                                       input_sample=torch.rand(1, 3, 224, 224),
                                       device='GPU',
                                       precision='int8',
                                       calib_data=DataLoader(train_dataset, batch_size=32))

> 📝 **Note**
> 
> For INT8 quantization, we adopt the Post-training Optimization Tools provided by OpenVINO toolkit, which only supports **static** post-training quantization. So `calib_data` (calibration data) is always required when `accelerator='openvino'`. Here batch size is not important as it intends to read 100 samples. And there could be no label in calibration data.
> 
> Please refer to [API documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl.nano.pytorch.InferenceOptimizer.quantize) for more information on `InferenceOptimizer.quantize`.

You could then do the normal inference steps with the model optimized by OpenVINO:

In [None]:
with InferenceOptimizer.get_context(ov_model):
    x = torch.rand(2, 3, 224, 224)
    # use the optimized model here
    y_hat = ov_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

## TensorFlow example

Let's take [MobileNetV2](https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet_v2/MobileNetV2) as an example. Note that you don't have to transfer the model to GPU at this step since `InferenceOptimizer` will handle this automatically.

In [None]:
import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
import numpy as np

tf_model = MobileNetV2(weights=None, input_shape=[40, 40, 3], classes=10)

train_examples = np.random.random((100, 40, 40, 3))
train_labels = np.random.randint(0, 10, size=(100,))
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))

To enable OpenVINO acceleration for your TensorFlow inference pipeline on Intel GPUs, **the only change you need to made is to import BigDL-Nano** `InferenceOptimizer`**, and trace your TensorFlow model to convert it into an OpenVINO accelerated module for inference, with specifying** `device='GPU'`.

> 📝 **Note**
> 
> By setting `device` to `'GPU'`, inference will be conducted on the default Intel GPU device. You can change to other devices (`'GPU.X'` / `'GPU.X.Y'`) instead.

In [None]:
from bigdl.nano.tf.keras import InferenceOptimizer

ov_model = InferenceOptimizer.trace(tf_model,
                                    accelerator="openvino",
                                    device='GPU')

If you want to quantize your model by using OpenVINO Post-training Optimization Tools, you could call `InferenceOptimizer.quantize`.

* For FP16 quantization:

In [None]:
from bigdl.nano.tf.keras import InferenceOptimizer

ov_model = InferenceOptimizer.quantize(tf_model,
                                       accelerator='openvino',
                                       device='GPU',
                                       precision='fp16')

* For INT8 quantization:

In [None]:
from bigdl.nano.tf.keras import InferenceOptimizer

ov_model = InferenceOptimizer.quantize(tf_model,
                                       accelerator='openvino',
                                       device='GPU',
                                       precision='int8',
                                       x=train_dataset)

> 📝 **Note**
> 
> For INT8 quantization, we adopt the Post-training Optimization Tools provided by OpenVINO toolkit, which only supports **static** post-training quantization. So `x` (serves as calibration data) is always required when `accelerator='openvino'`. And there could be no label in calibration data.
> 
> Please refer to [API documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/tensorflow.html#bigdl.nano.tf.keras.InferenceOptimizer.quantize) for more information on `InferenceOptimizer.quantize`.

You could then do the normal inference steps with the model optimized by OpenVINO:

In [None]:
x = tf.random.normal(shape=(100, 40, 40, 3))
# use the optimized model here
y_hat = ov_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Install/install_in_colab.html)