[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb)

# Automatic inference context management for PyTorch inference by `get_context`

You can use ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context manager:

1. ``torch.no_grad()`` to disable gradients, which will be used for all model
   
2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model
   
3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify ``thread_num`` when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``

To do inference using Bigdl-nano InferenceOptimizer, the following packages need to be installed first. We recommend you to use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to prepare the environment and install the following packages in a conda environment.

You can create a conda environment by executing:

```
# "nano" is conda environment name, you can use any name you like.
conda create -n nano python=3.7 setuptools=58.0.4
conda activate nano
```
> 📝 **Note**
>
> During your installation, there may be some warnings or errors about version, just ignore them.

In [None]:
# Necessary packages for inference accelaration
!pip install --pre --upgrade bigdl-nano[pytorch,inference]

Here we take a pretrained ResNet18 model for example.

In [None]:
import torch
from torchvision.models import resnet18

model = resnet18(pretrained=True)

## InferenceOptimizer.trace

For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take `ipex` for example.

In [3]:
from bigdl.nano.pytorch import InferenceOptimizer
ipex_model = InferenceOptimizer.trace(model,
                                      use_ipex=True,
                                      thread_num=4)
input_sample = torch.rand(1, 3, 224, 224)

with InferenceOptimizer.get_context(ipex_model):
    output = ipex_model(input_sample)
    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )

## InferenceOptimizer.quantize

For model accelerated by ``InferenceOptimizer.quantize``, usage now looks like below codes, here we just take ``bf16 + channels_last`` for example.

In [5]:
from bigdl.nano.pytorch import InferenceOptimizer
bf16_model = InferenceOptimizer.quantize(model,
                                         precision='bf16',
                                         channels_last=True,
                                         thread_num=4)
input_sample = torch.rand(1, 3, 224, 224)

with InferenceOptimizer.get_context(bf16_model):
    output = bf16_model(input_sample)
    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
    assert output.dtype == torch.bfloat16  # this line just to let you know Nano has provided autocast context manager automatically : )

## InferenceOptimizer.optimize

By calling ``optimize()``, you will get bunchs of accelerated models at the same time, then you can obtain the model you want by ``InferenceOptimizer.get_model`` or ``InferenceOptimizer.get_best_model``. Usage looks like below codes, here we just take `openvino` for example.

In [None]:
import torch
from pathlib import Path
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.datasets.utils import download_and_extract_archive
from torch.utils.data import Subset, DataLoader

def prepare_model_and_dataset(model_ft, val_size):
    DATA_URL = "https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"

    train_transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    val_transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    if not Path("data").exists():
        # download dataset
        download_and_extract_archive(url=DATA_URL, download_root="data", remove_finished=True)

    data_path = Path("data/cats_and_dogs_filtered")
    train_dataset = ImageFolder(data_path.joinpath("train"), transform=train_transform)
    val_dataset = ImageFolder(data_path.joinpath("validation"), transform=val_transform)

    indices = torch.randperm(len(val_dataset))
    val_dataset = Subset(val_dataset, indices=indices[:val_size])

    train_dataloader = DataLoader(dataset=train_dataset, batch_size=8, shuffle=True)
    val_dataloader = DataLoader(dataset=val_dataset, batch_size=8, shuffle=False)

    return train_dataset, val_dataset

train_dataset, val_dataset = prepare_model_and_dataset(model, val_size=500)

In [None]:
# To obtain the latency of single sample, set batch_size=1
train_dataloader = DataLoader(train_dataset, batch_size=1)
val_dataloader = DataLoader(val_dataset)

from bigdl.nano.pytorch import InferenceOptimizer
optimizer = InferenceOptimizer()
optimizer.optimize(model=model,
                   training_data=train_dataloader,
                   thread_num=4,
                   latency_sample_num=30)

In [9]:
openvino_model = optimizer.get_model("openvino_fp32")
input_sample = torch.rand(1, 3, 224, 224)

with InferenceOptimizer.get_context(openvino_model):
    output = openvino_model(input_sample)
    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )

In [10]:
accelerated_model, option = optimizer.get_best_model()
input_sample = torch.rand(1, 3, 224, 224)

with InferenceOptimizer.get_context(accelerated_model):
    output = accelerated_model(input_sample)
    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )

## Advanced Usage: Multiple Models

``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.

> 📝 **Note**
> 
>Here are some rules that how we solve conflict between multiple context managers:
>
> 1. If two context managers have difference precision (bf16 and non bf16), we will return AutocastContextManager()
>
> 2. If only one context manager have thread_num, we will set thread_num to that value
>
> 3. If two context managers have different thread_num, we will set thread_num to the larger one

Here is a simple example just to explain the usage for pipeline:

In [17]:
from torch import nn

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1000, 1)
    
    def forward(self, x):
        return self.linear(x)

classifer = Classifier()

with InferenceOptimizer.get_context(ipex_model, classifer):
    # a pipeline consists of backbone and classifier
    x = ipex_model(input_sample)
    output = classifer(x) 
    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/install_in_colab.html)