[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/quantize_pytorch_inference_inc.ipynb)

# Quantize PyTorch Model for Inference using Intel Neural Compressor

With Intel Neural Compressor (INC) as quantization engine, you can apply `InferenceOptimizer.quantize` API to realize post-training quantization on your PyTorch `nn.Module`. `InferenceOptimizer.quantize` also supports ONNXRuntime acceleration at the meantime through specifying `accelerator='onnxruntime'`. All acceleration takes only a few lines.

To quantize your model with INC, the following dependencies need to be installed first:

In [None]:
# for BigDL-Nano
!pip install --pre --upgrade bigdl-nano[pytorch,inference]  # install the nightly-built version
# !source bigdl-nano-init

> 📝 **Note**
> 
> We recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect.

Let's take an [ResNet-18 model](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html) pretrained on ImageNet dataset and finetuned on [OxfordIIITPet dataset](https://pytorch.org/vision/main/generated/torchvision.datasets.OxfordIIITPet.html) as an example:

In [None]:
# Define the finetune function
import torch
from torchvision import transforms
from torchvision.datasets import OxfordIIITPet
from torch.utils.data.dataloader import DataLoader
from torchvision.models import resnet18
from bigdl.nano.pytorch import Trainer
from torchmetrics import Accuracy

def finetune_pet_dataset(model_ft):

    train_transform = transforms.Compose([transforms.Resize(256),
                                          transforms.RandomCrop(224),
                                          transforms.RandomHorizontalFlip(),
                                          transforms.ColorJitter(brightness=.5, hue=.3),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406],
                                                               [0.229, 0.224, 0.225])])
    val_transform = transforms.Compose([transforms.Resize(256),
                                        transforms.CenterCrop(224),
                                        transforms.ToTensor(),
                                        transforms.Normalize([0.485, 0.456, 0.406],
                                                             [0.229, 0.224, 0.225])])

    # apply data augmentation to the tarin_dataset
    train_dataset = OxfordIIITPet(root="/tmp/data",
                                  transform=train_transform,
                                  download=True)
    val_dataset = OxfordIIITPet(root="/tmp/data",
                                transform=val_transform)

    # obtain training indices that will be used for validation
    indices = torch.randperm(len(train_dataset))
    val_size = len(train_dataset) // 4
    train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
    val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])

    # prepare data loaders
    train_dataloader = DataLoader(train_dataset, batch_size=32)
    val_dataloader = DataLoader(val_dataset, batch_size=32)

    num_ftrs = model_ft.fc.in_features

    # here the size of each output sample is set to 37.
    model_ft.fc = torch.nn.Linear(num_ftrs, 37)
    loss_ft = torch.nn.CrossEntropyLoss()
    optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)

    # compile our model with loss function, optimizer.
    model = Trainer.compile(model_ft, loss_ft, optimizer_ft, metrics=[Accuracy()])
    trainer = Trainer(max_epochs=1)
    trainer.fit(model, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader)

    return model, train_dataset, val_dataset

In [None]:
from torchvision.models import resnet18

model = resnet18(pretrained=True)
_, train_dataset, val_dataset = finetune_pet_dataset(model)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_The full definition of function_ `finetune_pet_dataset` _could be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/quantize_pytorch_inference_inc.ipynb).

Then we set the model in evaluation mode:

In [None]:
model.eval()

To enable quantization using INC for inference, you could simply **import BigDL-Nano** `InferenceOptimizer`**, and use** `InferenceOptimizer` **to quantize your PyTorch model**:

In [None]:
from bigdl.nano.pytorch import InferenceOptimizer

q_model = InferenceOptimizer.quantize(model, 
                                      calib_data=DataLoader(train_dataset, batch_size=32))

If you want to enable the ONNXRuntime acceleration at the meantime, you could just specify the `accelerator` parameter:

In [None]:
from bigdl.nano.pytorch import InferenceOptimizer

q_model = InferenceOptimizer.quantize(model,
                                      accelerator='onnxruntime',
                                      calib_data=DataLoader(train_dataset, batch_size=32))

> 📝 **Note**
> 
> `InferenceOptimizer` will by default quantize your PyTorch `nn.Module` through **static** post-training quantization. For this case, `calib_dataloader` (for calibration data) is required. Batch size is not important to ``calib_dataloader``, as it intends to read 100 samples. And there could be no label in calibration data.
> 
> If you would like to implement dynamic post-training quantization, you could set parameter `approach='dynamic'`. In this case, `calib_dataloader` should be `None`. Compared to dynamic quantization, static quantization could lead to faster inference as it eliminates the data conversion costs between layers.
> 
> Please refer to [API documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl.nano.pytorch.InferenceOptimizer.quantize) for more information on `InferenceOptimizer.quantize`.

You could then do the normal inference steps with the quantized model:

In [None]:
with InferenceOptimizer.get_context(q_model):
    x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
    # use the quantized model here
    y_hat = q_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/install_in_colab.html)