# Quantize Model with Intel Neural Compressor
### Prepare Environment
Before you start with Apis delivered by bigdl-nano, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to set up your environment.<br><br>
By default, Intel Neural Compressor is not installed with BigDL-Nano. So if you determine to use it as your quantization backend, you'll need to install it first:
```shell
pip install neural-compressor==1.11.0
```
It's also required to install onnxruntime-extensions as a dependency of INC when using ONNXRuntime as backend as well as the dependencies of onnxruntime
```bash
pip install onnx onnxruntime
```


### Load Data
We used the [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) for demo, which contains 37 categories with roughly 200 images for each classes.

In [1]:
from torchvision.datasets import OxfordIIITPet
from torch.utils.data.dataloader import DataLoader
from torchvision import transforms
from torch.utils.data.dataloader import DataLoader
data_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ColorJitter(brightness=.5, hue=.3),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
data_set = OxfordIIITPet(root="./data/", transform=data_transforms)
data_loader = DataLoader(data_set, batch_size=32, shuffle=True)

  from .autonotebook import tqdm as notebook_tqdm


### Custom Model
Regarding the model, we used pretrained torchvision.models.resnet18. More details, please refer to [here](https://pytorch.org/vision/0.12/generated/torchvision.models.resnet18.html?highlight=resnet18)

In [2]:
from torchvision.models import resnet18
import torch
import torch.nn as nn
# define your own model
model_ft = resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, len(data_set.classes))
loss_ft = nn.CrossEntropyLoss()
optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
from bigdl.nano.pytorch import Trainer
model = Trainer.compile(model_ft, loss_ft, optimizer_ft)
# (Optional) Something else, like training ...
trainer = Trainer(max_epochs=5)
trainer.fit(model, train_dataloader=data_loader)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
  "`trainer.fit(train_dataloader)` is deprecated in v1.4 and will be removed in v1.6."
  rank_zero_warn(f"you defined a {step_name} but have no {loader_name}. Skipping {stage} loop")

  | Name  | Type             | Params
-------------------------------------------
0 | model | ResNet           | 11.2 M
1 | loss  | CrossEntropyLoss | 0     
-------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.782    Total estimated model params size (MB)


                                           

  f"The dataloader, {name}, does not have many workers which may be a bottleneck."


Epoch 4: 100%|██████████| 115/115 [00:56<00:00,  2.07it/s, loss=0.453, v_num=37] 


### Quantization without extra accelerator
To use INC as your quantization engine, you can choose accelerator as None or 'onnxruntime'.<br>
Without extra accelerator, `Trainer.quantize()` returns a pytorch module with desired precision and accuracy.


In [3]:
from torchmetrics.functional import accuracy
q_model = trainer.quantize(model, calib_dataloader=data_loader, metric=accuracy)
batch = torch.stack([data_set[0][0], data_set[1][0]])
q_model(batch)

2022-06-28 03:37:50 [INFO] Pass query framework capability elapsed time: 192.36 ms
2022-06-28 03:37:50 [INFO] Get FP32 model baseline.
2022-06-28 03:38:29 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-06-28_03-37-49/./history.snapshot.
2022-06-28 03:38:30 [INFO] FP32 baseline is: [Accuracy: 0.8446, Duration (seconds): 38.6896]
  torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
  torch.tensor(weight_qparams["zero_point"], dtype=torch.int, device=device))
  torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
  dtype=torch.int, device=device))
2022-06-28 03:38:32 [INFO] |********Mixed Precision Statistics*******|
2022-06-28 03:38:32 [INFO] +------------------------+--------+-------+
2022-06-28 03:38:32 [INFO] |        Op Type         | Total  |  INT8 |
2022-06-28 03:38:32 [INFO] +------------------------+--------+-------+
2022-06-28 03:38:32 [INFO] |  quantize_per_tensor   |   1    |

tensor([[12.9424, -3.7575, -2.7833, -3.4791,  0.1392,  9.8807, -2.3658,  4.1750,
         -0.1392,  1.2525,  0.9742,  6.8191, -1.9483, -1.2525, -0.8350, -4.5924,
         -3.4791, -4.0358, -2.9225, -2.6441,  7.6541,  2.0875, -5.2883, -2.5050,
         -1.2525, -2.6441,  3.2008,  4.1750, -3.8966, -4.1750, -0.9742,  3.7575,
          1.3917,  1.2525, -4.0358,  0.6958,  0.4175],
        [ 9.8807,  0.8350, -2.0875, -3.0616,  0.8350,  5.0099, -2.7833,  2.7833,
          2.5050,  2.5050,  2.9225,  4.3141, -3.4791, -1.5308, -0.1392, -5.2883,
         -2.0875, -2.6441, -3.8966, -4.1750,  0.8350,  5.1491, -4.7316, -2.3658,
         -2.2266, -1.2525,  0.4175,  2.7833, -3.7575, -4.1750, -0.4175,  3.7575,
          2.5050,  9.8807, -2.6441, -0.8350,  0.0000]])

### Quantization with ONNXRuntime accelerator
With the ONNXRuntime accelerator, `Trainer.quantize()` will return a model with compressed precision but running inference in the ONNXRuntime engine.

In [4]:
ort_q_model = trainer.quantize(model, accelerator='onnxruntime', calib_dataloader=data_loader, metric=accuracy)
ort_q_model(batch)

2022-06-28 03:39:10 [INFO] Get FP32 model baseline.
2022-06-28 03:39:56 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-06-28_03-37-49/./history.snapshot.
2022-06-28 03:39:56 [INFO] FP32 baseline is: [Accuracy: 0.8470, Duration (seconds): 45.8410]
tcmalloc: large alloc 1073741824 bytes == 0x557ae7548000 @  0x7f6f50fc5d3f 0x7f6f50ffc0c0 0x7f6f50fff082 0x7f6f50fff243 0x7f6eac41116c 0x7f6eac5df8d4 0x7f6eac4291df 0x7f6eac4733c6 0x7f6eac46b9e4 0x7f6eac08dcce 0x7f6eac08e4e2 0x7f6eac03b3d4 0x7f6eac0076e2 0x557a5837ee74 0x557a583dd507 0x557a58395591 0x557a583ac6d5 0x557a5834a6ad 0x557a58378af1 0x557a583953a5 0x557a583a911a 0x557a5834ae03 0x557a58378a40 0x557a583953a5 0x557a583a911a 0x557a5834a6ad 0x557a58378af1 0x557a583953a5 0x557a583a911a 0x557a5834ad04 0x557a58378a40
2022-06-28 03:40:01 [INFO] |*******Mixed Precision Statistics******|
2022-06-28 03:40:01 [INFO] +----------------------+--------+-------+
2022-06-28 03:40:01 [INFO] | 

tensor([[12.8406, -3.8216, -2.9044, -3.5159, -0.1529,  9.9362, -2.2930,  4.2802,
         -0.3057,  1.3758,  0.4586,  7.0318, -1.9872, -1.0701, -0.6115, -4.4331,
         -3.3630, -4.2802, -2.9044, -2.4458,  7.9490,  1.9872, -5.1974, -2.4458,
         -1.3758, -2.7516,  3.3630,  4.1273, -3.8216, -3.9745, -0.9172,  3.6688,
          1.3758,  0.7643, -3.9745,  0.9172,  0.1529],
        [10.0891,  0.9172, -2.1401, -3.2102,  0.9172,  5.3503, -2.7516,  2.5987,
          2.5987,  2.5987,  3.2102,  4.2802, -3.6688, -1.6815, -0.3057, -5.3503,
         -2.1401, -2.5987, -4.1273, -4.2802,  0.7643,  5.1974, -4.8917, -2.4458,
         -2.1401, -1.0701,  0.4586,  2.7516, -3.8216, -4.2802, -0.4586,  3.8216,
          2.5987,  9.9362, -2.5987, -0.9172,  0.1529]])