# Quantize Model with Intel Neural Compressor
### Prepare Environment
Before you start with Apis delivered by bigdl-nano, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to set up your environment.<br><br>
By default, Intel Neural Compressor is not installed with BigDL-Nano. So if you determine to use it as your quantization backend, you'll need to install it first:
```shell
pip install neural-compressor==1.11.0
```
It's also required to install onnxruntime-extensions as a dependency of INC when using ONNXRuntime as backend as well as the dependencies of onnxruntime
```bash
pip install onnx onnxruntime
```


### Load Data
We used the [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) for demo, which contains 37 categories with roughly 200 images for each classes.

In [1]:
import os
import torch
from torchvision.io import read_image
from torchvision import transforms
from torchvision.datasets import OxfordIIITPet
from torch.utils.data.dataloader import DataLoader

train_transform = transforms.Compose([transforms.Resize(256),
                                      transforms.RandomCrop(224),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ColorJitter(brightness=.5, hue=.3),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
# Apply data augmentation to the tarin_dataset
train_dataset = OxfordIIITPet(root = ".", transform=train_transform, download=True)
val_dataset = OxfordIIITPet(root=".", transform=val_transform)

# obtain training indices that will be used for validation
indices = torch.randperm(len(train_dataset))
val_size = len(train_dataset) // 4
train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])

# prepare data loaders
train_dataloader = DataLoader(train_dataset, batch_size=32)

DEV_RUN = bool(os.environ.get('DEV_RUN', False))

  from .autonotebook import tqdm as notebook_tqdm


### Custom Model
Regarding the model, we used pretrained torchvision.models.resnet18. More details, please refer to [here](https://pytorch.org/vision/0.12/generated/torchvision.models.resnet18.html?highlight=resnet18)

In [2]:
import torch
from torchvision.models import resnet18
from bigdl.nano.pytorch import Trainer
from torchmetrics import Accuracy
model_ft = resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features

# Here the size of each output sample is set to 37.
model_ft.fc = torch.nn.Linear(num_ftrs, 37)
loss_ft = torch.nn.CrossEntropyLoss()
optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)

# Compile our model with loss function, optimizer.
model = Trainer.compile(model_ft, loss_ft, optimizer_ft, metrics=[Accuracy])
trainer = Trainer(max_epochs=5,
                  fast_dev_run=DEV_RUN) #Run model quickly in test
trainer.fit(model, train_dataloaders=train_dataloader)

# Inference/Prediction
x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
model_ft.eval()
y_hat = model_ft(x)
y_hat.argmax(dim=1)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
  "`trainer.fit(train_dataloader)` is deprecated in v1.4 and will be removed in v1.6."
  rank_zero_warn(f"you defined a {step_name} but have no {loader_name}. Skipping {stage} loop")

  | Name  | Type             | Params
-------------------------------------------
0 | model | ResNet           | 11.2 M
1 | loss  | CrossEntropyLoss | 0     
-------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.782    Total estimated model params size (MB)


                                           

  f"The dataloader, {name}, does not have many workers which may be a bottleneck."


Epoch 4: 100%|██████████| 87/87 [00:42<00:00,  2.08it/s, loss=0.308, v_num=19]  


tensor([29, 18])

### Quantization without extra accelerator
To use INC as your quantization engine, you can choose accelerator as None or 'onnxruntime'.<br>
Without extra accelerator, `Trainer.quantize()` returns a pytorch module.

In [3]:
q_model = trainer.quantize(model, calib_dataloader=train_dataloader)
y_hat = q_model(x)
y_hat.argmax(dim=1)

2022-07-01 09:14:43 [INFO] Generate a fake evaluation function.
2022-07-01 09:14:43 [INFO] Pass query framework capability elapsed time: 183.81 ms
2022-07-01 09:14:43 [INFO] Get FP32 model baseline.
2022-07-01 09:14:43 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/./history.snapshot.
2022-07-01 09:14:43 [INFO] FP32 baseline is: [Accuracy: 1.0000, Duration (seconds): 0.0000]
  torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
  torch.tensor(weight_qparams["zero_point"], dtype=torch.int, device=device))
  torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
  dtype=torch.int, device=device))
2022-07-01 09:14:46 [INFO] |********Mixed Precision Statistics*******|
2022-07-01 09:14:46 [INFO] +------------------------+--------+-------+
2022-07-01 09:14:46 [INFO] |        Op Type         | Total  |  INT8 |
2022-07-01 09:14:46 [INFO] +------------------------+--------+-------+

tensor([29, 18])

### Quantization with ONNXRuntime accelerator
With the ONNXRuntime accelerator, `Trainer.quantize()` will return a model with compressed precision but running inference in the ONNXRuntime engine.

In [4]:
ort_q_model = trainer.quantize(model, accelerator='onnxruntime', calib_dataloader=train_dataloader)
y_hat = ort_q_model(x)
y_hat.argmax(dim=1)

2022-07-01 09:14:48 [INFO] Generate a fake evaluation function.
2022-07-01 09:14:48 [INFO] Get FP32 model baseline.
2022-07-01 09:14:48 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/./history.snapshot.
2022-07-01 09:14:48 [INFO] FP32 baseline is: [Accuracy: 1.0000, Duration (seconds): 0.0000]
tcmalloc: large alloc 1073741824 bytes == 0x557ba31c2000 @  0x7f24da9a1d3f 0x7f24da9d80c0 0x7f24da9db082 0x7f24da9db243 0x7f243446d16c 0x7f243463b8d4 0x7f24344851df 0x7f24344cf3c6 0x7f24344c79e4 0x7f24340e9cce 0x7f24340ea4e2 0x7f24340973d4 0x7f24340636e2 0x557b1f4b7e74 0x557b1f516507 0x557b1f4ce591 0x557b1f4e56d5 0x557b1f4836ad 0x557b1f4b1af1 0x557b1f4ce3a5 0x557b1f4e211a 0x557b1f483e03 0x557b1f4b1a40 0x557b1f4ce3a5 0x557b1f4e211a 0x557b1f4836ad 0x557b1f4b1af1 0x557b1f4ce3a5 0x557b1f4e211a 0x557b1f483d04 0x557b1f4b1a40
2022-07-01 09:14:54 [INFO] |*******Mixed Precision Statistics******|
2022-07-01 09:14:54 [INFO] +-------

tensor([29, 18])