# PyTorch Model from torchvision - Quantization for IMX500

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/pytorch/pytorch_torchvision_classification_model_for_imx500.ipynb)

## Overview

In this tutorial, we will illustrate a basic and quick process of preparing a pre-trained model for deployment using MCT. 
We will use an existing pre-trained model from [torchvision](https://pytorch.org/vision/stable/models.html). The user can choose any torchvision model from this list.   

## Setup
### Install the relevant packages

In [None]:
!pip install -q torch
!pip install -q torchvision
!pip install -q onnx

Install MCT (if it’s not already installed). Additionally, in order to use all the necessary utility functions for this tutorial, we also copy [MCT tutorials folder](https://github.com/sony/model_optimization/tree/main/tutorials) and add it to the system path.

In [None]:
import sys
import importlib

if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit
!git clone https://github.com/sony/model_optimization.git temp_mct && mv temp_mct/tutorials . && \rm -rf temp_mct
sys.path.insert(0,"tutorials")

### Download ImageNet validation set
Download ImageNet dataset with only the validation split.

Note that for demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.

This step may take several minutes...

In [None]:
import os
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !mv ILSVRC2012_devkit_t12.tar.gz imagenet/
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    !mv ILSVRC2012_img_val.tar imagenet/

## Model Quantization

### Download a Pre-Trained Model

In [43]:
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights

model = mobilenet_v2(weights=MobileNet_V2_Weights)
model.eval()

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /home/yardeny/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 28.8MB/s]


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

### Post training quantization using Model Compression Toolkit 

Now, we're all set to use MCT's post-training quantization. To begin, we'll define a representative dataset and proceed with the model quantization. Please note that, for demonstration purposes, we'll use the evaluation dataset as our representative dataset. We'll calibrate the model using 80 representative images, divided into 20 iterations of 'batch_size' images each. 

In [44]:
import model_compression_toolkit as mct
from model_compression_toolkit.core.pytorch.pytorch_device_config import get_working_device
from typing import Iterator, Tuple, List
from torch.utils.data import DataLoader
from torchvision import transforms, datasets


BATCH_SIZE = 4
n_iters = 20
device = get_working_device()

# Define transformations for the validation set
val_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

# Extract ImageNet validation dataset using torchvision "datasets" module
val_dataset = datasets.ImageNet(root='./imagenet', split='val', transform=val_transform)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

# Define representative dataset generator
def get_representative_dataset(n_iter: int, dataset_loader: Iterator[Tuple]):
    """
    This function creates a representative dataset generator. The generator yields numpy
        arrays of batches of shape: [Batch, H, W ,C].
    Args:
        n_iter: number of iterations for MCT to calibrate on
    Returns:
        A representative dataset generator
    """       
    def representative_dataset() -> Iterator[List]:
        ds_iter = iter(dataset_loader)
        for _ in range(n_iter):
            yield [next(ds_iter)[0]]

    return representative_dataset

# Get representative dataset generator
representative_dataset_gen = get_representative_dataset(n_iter=n_iters, dataset_loader=val_loader)

# Perform post training quantization with the default configuration
quant_model, _ = mct.ptq.pytorch_post_training_quantization(model, representative_dataset_gen)
print('Quantized model is ready')

Statistics Collection: 20it [00:15,  1.30it/s]



Running quantization parameters search. This process might take some time, depending on the model size and the selected quantization methods.



Calculating quantization parameters: 100%|██████████| 201/201 [00:26<00:00,  7.52it/s]


Weights_memory: 5236192.0, Activation_memory: 2408448.0, Total_memory: 7644640.0, BOPS: 7804365562445824

Please run your accuracy evaluation on the exported quantized model to verify it's accuracy.
Checkout the FAQ and Troubleshooting pages for resolving common issues and improving the quantized model accuracy:
FAQ: https://github.com/sony/model_optimization/tree/main/FAQ.md
Quantization Troubleshooting: https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md
Quantized model is ready


### Model Export

Now, we can export the quantized model, ready for deployment, into a `.onnx` format file. Please ensure that the `save_model_path` has been set correctly. 

In [47]:
mct.exporter.pytorch_export_model(model=quant_model,
                                  save_model_path='./torchvision_qmodel.onnx',
                                  repr_dataset=representative_dataset_gen,
                                  onnx_opset_version=17)

Exporting onnx model with MCTQ quantizers: ./torchvision_qmodel.onnx


## Evaluation on ImageNet dataset

### Floating point model evaluation
Please ensure that the dataset path has been set correctly before running this code cell.

In [48]:
from tutorials.resources.utils.pytorch_tutorial_tools import classification_eval
# Evaluate the model on ImageNet
eval_results = classification_eval(model, val_loader)

# Print float model Accuracy results
print("Float model Accuracy: {:.4f}".format(round(100 * eval_results[0], 2)))

Model name: EfficientNet


Classification evaluation:   2%|▏         | 259/12500 [00:03<02:23, 85.51it/s]

Num of images: 1000, Accuracy: 90.3 %


Classification evaluation:   4%|▍         | 517/12500 [00:06<02:10, 91.56it/s]

Num of images: 2000, Accuracy: 85.7 %


Classification evaluation:   6%|▌         | 759/12500 [00:08<02:09, 91.00it/s]

Num of images: 3000, Accuracy: 83.7 %


Classification evaluation:   8%|▊         | 1014/12500 [00:11<02:11, 87.39it/s]

Num of images: 4000, Accuracy: 81.47 %


Classification evaluation:  10%|█         | 1268/12500 [00:14<01:57, 95.52it/s] 

Num of images: 5000, Accuracy: 83.42 %


Classification evaluation:  12%|█▏        | 1519/12500 [00:17<01:57, 93.70it/s]

Num of images: 6000, Accuracy: 83.65 %


Classification evaluation:  14%|█▍        | 1759/12500 [00:19<02:05, 85.79it/s]

Num of images: 7000, Accuracy: 84.31 %


Classification evaluation:  16%|█▌        | 2008/12500 [00:22<01:50, 95.21it/s] 

Num of images: 8000, Accuracy: 84.65 %


Classification evaluation:  18%|█▊        | 2267/12500 [00:25<01:47, 95.46it/s]

Num of images: 9000, Accuracy: 83.86 %


Classification evaluation:  20%|██        | 2512/12500 [00:28<01:50, 90.19it/s]

Num of images: 10000, Accuracy: 83.42 %


Classification evaluation:  22%|██▏       | 2758/12500 [00:30<01:44, 92.90it/s] 

Num of images: 11000, Accuracy: 83.56 %


Classification evaluation:  24%|██▍       | 3016/12500 [00:33<01:33, 101.15it/s]

Num of images: 12000, Accuracy: 83.35 %


Classification evaluation:  26%|██▌       | 3261/12500 [00:36<01:46, 86.67it/s] 

Num of images: 13000, Accuracy: 83.22 %


Classification evaluation:  28%|██▊       | 3514/12500 [00:38<01:45, 85.20it/s]

Num of images: 14000, Accuracy: 83.11 %


Classification evaluation:  30%|███       | 3765/12500 [00:41<01:44, 83.32it/s]

Num of images: 15000, Accuracy: 83.01 %


Classification evaluation:  32%|███▏      | 4010/12500 [00:44<01:27, 96.84it/s]

Num of images: 16000, Accuracy: 82.88 %


Classification evaluation:  34%|███▍      | 4266/12500 [00:47<01:32, 89.26it/s]

Num of images: 17000, Accuracy: 83.34 %


Classification evaluation:  36%|███▌      | 4511/12500 [00:50<01:29, 89.45it/s]

Num of images: 18000, Accuracy: 83.16 %


Classification evaluation:  38%|███▊      | 4764/12500 [00:52<01:21, 94.59it/s]

Num of images: 19000, Accuracy: 83.26 %


Classification evaluation:  40%|████      | 5012/12500 [00:55<01:18, 95.63it/s]

Num of images: 20000, Accuracy: 83.08 %


Classification evaluation:  42%|████▏     | 5265/12500 [00:58<01:21, 88.90it/s]

Num of images: 21000, Accuracy: 82.52 %


Classification evaluation:  44%|████▍     | 5513/12500 [01:01<01:14, 93.34it/s]

Num of images: 22000, Accuracy: 82.12 %


Classification evaluation:  46%|████▌     | 5766/12500 [01:03<01:12, 92.72it/s]

Num of images: 23000, Accuracy: 81.82 %


Classification evaluation:  48%|████▊     | 6014/12500 [01:06<01:19, 81.58it/s]

Num of images: 24000, Accuracy: 81.38 %


Classification evaluation:  50%|█████     | 6258/12500 [01:09<01:13, 84.65it/s]

Num of images: 25000, Accuracy: 80.77 %


Classification evaluation:  52%|█████▏    | 6512/12500 [01:12<01:12, 82.64it/s]

Num of images: 26000, Accuracy: 80.49 %


Classification evaluation:  54%|█████▍    | 6758/12500 [01:15<01:07, 85.59it/s]

Num of images: 27000, Accuracy: 80.2 %


Classification evaluation:  56%|█████▌    | 7012/12500 [01:18<00:59, 92.22it/s]

Num of images: 28000, Accuracy: 80.08 %


Classification evaluation:  58%|█████▊    | 7262/12500 [01:20<00:56, 92.30it/s] 

Num of images: 29000, Accuracy: 80.18 %


Classification evaluation:  60%|██████    | 7513/12500 [01:23<00:52, 95.25it/s]

Num of images: 30000, Accuracy: 79.96 %


Classification evaluation:  62%|██████▏   | 7759/12500 [01:26<00:50, 94.21it/s]

Num of images: 31000, Accuracy: 79.91 %


Classification evaluation:  64%|██████▍   | 8012/12500 [01:29<00:53, 83.89it/s] 

Num of images: 32000, Accuracy: 79.4 %


Classification evaluation:  66%|██████▌   | 8260/12500 [01:31<00:44, 96.04it/s]

Num of images: 33000, Accuracy: 79.21 %


Classification evaluation:  68%|██████▊   | 8511/12500 [01:34<00:43, 90.82it/s]

Num of images: 34000, Accuracy: 79.08 %


Classification evaluation:  70%|███████   | 8759/12500 [01:37<00:43, 85.54it/s]

Num of images: 35000, Accuracy: 78.97 %


Classification evaluation:  72%|███████▏  | 9019/12500 [01:40<00:36, 94.31it/s]

Num of images: 36000, Accuracy: 78.93 %


Classification evaluation:  74%|███████▍  | 9261/12500 [01:42<00:37, 86.96it/s]

Num of images: 37000, Accuracy: 78.88 %


Classification evaluation:  76%|███████▌  | 9509/12500 [01:45<00:31, 94.80it/s]

Num of images: 38000, Accuracy: 78.68 %


Classification evaluation:  78%|███████▊  | 9762/12500 [01:48<00:30, 88.44it/s]

Num of images: 39000, Accuracy: 78.58 %


Classification evaluation:  80%|████████  | 10020/12500 [01:51<00:25, 95.97it/s]

Num of images: 40000, Accuracy: 78.43 %


Classification evaluation:  82%|████████▏ | 10268/12500 [01:54<00:24, 89.86it/s]

Num of images: 41000, Accuracy: 78.31 %


Classification evaluation:  84%|████████▍ | 10512/12500 [01:56<00:21, 90.91it/s]

Num of images: 42000, Accuracy: 78.08 %


Classification evaluation:  86%|████████▌ | 10767/12500 [01:59<00:17, 97.33it/s]

Num of images: 43000, Accuracy: 77.94 %


Classification evaluation:  88%|████████▊ | 11014/12500 [02:02<00:16, 91.97it/s]

Num of images: 44000, Accuracy: 77.87 %


Classification evaluation:  90%|█████████ | 11258/12500 [02:04<00:13, 95.50it/s]

Num of images: 45000, Accuracy: 77.74 %


Classification evaluation:  92%|█████████▏| 11509/12500 [02:07<00:10, 92.71it/s]

Num of images: 46000, Accuracy: 77.62 %


Classification evaluation:  94%|█████████▍| 11759/12500 [02:10<00:08, 91.53it/s]

Num of images: 47000, Accuracy: 77.64 %


Classification evaluation:  96%|█████████▌| 12015/12500 [02:13<00:05, 94.17it/s] 

Num of images: 48000, Accuracy: 77.77 %


Classification evaluation:  98%|█████████▊| 12266/12500 [02:15<00:02, 82.37it/s]

Num of images: 49000, Accuracy: 77.52 %


Classification evaluation: 100%|██████████| 12500/12500 [02:18<00:00, 90.18it/s]

Num of images: 50000, Accuracy: 77.67 %
Model name: EfficientNet
Float model Accuracy: 77.6700





### Quantized model evaluation
We can evaluate the performance of the quantized model. There is a slight decrease in performance that can be further mitigated by either expanding the representative dataset or employing MCT's advanced quantization methods, such as GPTQ (Gradient-Based/Enhanced Post Training Quantization).

In [50]:
# Evaluate the quantized model on ImageNet
eval_results = classification_eval(quant_model, val_loader)

# Print quantized model Accuracy results
print("Quantized model Accuracy: {:.4f}".format(round(100 * eval_results[0], 2)))

Model name: EfficientNet


Classification evaluation:   2%|▏         | 253/12500 [00:10<08:54, 22.92it/s]

Num of images: 1000, Accuracy: 87.3 %


Classification evaluation:   4%|▍         | 502/12500 [00:20<08:46, 22.78it/s]

Num of images: 2000, Accuracy: 83.2 %


Classification evaluation:   6%|▌         | 754/12500 [00:30<07:33, 25.90it/s]

Num of images: 3000, Accuracy: 80.03 %


Classification evaluation:   8%|▊         | 1003/12500 [00:40<07:18, 26.21it/s]

Num of images: 4000, Accuracy: 77.65 %


Classification evaluation:  10%|▉         | 1249/12500 [00:50<06:48, 27.54it/s]

Num of images: 5000, Accuracy: 80.04 %


Classification evaluation:  12%|█▏        | 1504/12500 [01:00<06:39, 27.50it/s]

Num of images: 6000, Accuracy: 80.0 %


Classification evaluation:  14%|█▍        | 1753/12500 [01:10<06:52, 26.05it/s]

Num of images: 7000, Accuracy: 80.7 %


Classification evaluation:  16%|█▌        | 2005/12500 [01:20<06:24, 27.32it/s]

Num of images: 8000, Accuracy: 81.12 %


Classification evaluation:  18%|█▊        | 2255/12500 [01:29<06:55, 24.63it/s]

Num of images: 9000, Accuracy: 80.29 %


Classification evaluation:  20%|██        | 2504/12500 [01:39<06:27, 25.79it/s]

Num of images: 10000, Accuracy: 79.56 %


Classification evaluation:  22%|██▏       | 2753/12500 [01:48<06:49, 23.78it/s]

Num of images: 11000, Accuracy: 79.59 %


Classification evaluation:  24%|██▍       | 3005/12500 [01:58<06:05, 25.97it/s]

Num of images: 12000, Accuracy: 79.19 %


Classification evaluation:  26%|██▌       | 3254/12500 [02:08<06:31, 23.60it/s]

Num of images: 13000, Accuracy: 79.12 %


Classification evaluation:  28%|██▊       | 3503/12500 [02:18<06:01, 24.89it/s]

Num of images: 14000, Accuracy: 78.8 %


Classification evaluation:  30%|███       | 3755/12500 [02:28<05:33, 26.20it/s]

Num of images: 15000, Accuracy: 78.75 %


Classification evaluation:  32%|███▏      | 4004/12500 [02:37<05:09, 27.42it/s]

Num of images: 16000, Accuracy: 78.73 %


Classification evaluation:  34%|███▍      | 4253/12500 [02:47<05:27, 25.20it/s]

Num of images: 17000, Accuracy: 79.29 %


Classification evaluation:  36%|███▌      | 4505/12500 [02:57<05:22, 24.76it/s]

Num of images: 18000, Accuracy: 79.16 %


Classification evaluation:  38%|███▊      | 4754/12500 [03:06<04:53, 26.41it/s]

Num of images: 19000, Accuracy: 79.35 %


Classification evaluation:  40%|████      | 5003/12500 [03:16<04:34, 27.34it/s]

Num of images: 20000, Accuracy: 79.21 %


Classification evaluation:  42%|████▏     | 5255/12500 [03:26<04:21, 27.71it/s]

Num of images: 21000, Accuracy: 78.66 %


Classification evaluation:  44%|████▍     | 5504/12500 [03:35<04:41, 24.88it/s]

Num of images: 22000, Accuracy: 78.25 %


Classification evaluation:  46%|████▌     | 5753/12500 [03:45<04:10, 26.98it/s]

Num of images: 23000, Accuracy: 77.94 %


Classification evaluation:  48%|████▊     | 6004/12500 [03:55<04:01, 26.84it/s]

Num of images: 24000, Accuracy: 77.56 %


Classification evaluation:  50%|█████     | 6253/12500 [04:04<03:44, 27.85it/s]

Num of images: 25000, Accuracy: 76.9 %


Classification evaluation:  52%|█████▏    | 6505/12500 [04:14<03:56, 25.30it/s]

Num of images: 26000, Accuracy: 76.54 %


Classification evaluation:  54%|█████▍    | 6754/12500 [04:23<03:51, 24.78it/s]

Num of images: 27000, Accuracy: 76.27 %


Classification evaluation:  56%|█████▌    | 7003/12500 [04:33<03:43, 24.54it/s]

Num of images: 28000, Accuracy: 76.13 %


Classification evaluation:  58%|█████▊    | 7252/12500 [04:43<03:14, 27.04it/s]

Num of images: 29000, Accuracy: 76.32 %


Classification evaluation:  60%|██████    | 7504/12500 [04:53<03:09, 26.34it/s]

Num of images: 30000, Accuracy: 76.13 %


Classification evaluation:  62%|██████▏   | 7753/12500 [05:02<03:14, 24.41it/s]

Num of images: 31000, Accuracy: 76.09 %


Classification evaluation:  64%|██████▍   | 8003/12500 [05:12<02:34, 29.08it/s]

Num of images: 32000, Accuracy: 75.57 %


Classification evaluation:  66%|██████▌   | 8255/12500 [05:21<02:28, 28.54it/s]

Num of images: 33000, Accuracy: 75.42 %


Classification evaluation:  68%|██████▊   | 8503/12500 [05:30<02:19, 28.61it/s]

Num of images: 34000, Accuracy: 75.22 %


Classification evaluation:  70%|███████   | 8753/12500 [05:39<02:05, 29.80it/s]

Num of images: 35000, Accuracy: 75.11 %


Classification evaluation:  72%|███████▏  | 9004/12500 [05:48<01:58, 29.43it/s]

Num of images: 36000, Accuracy: 75.03 %


Classification evaluation:  74%|███████▍  | 9254/12500 [05:57<01:56, 27.84it/s]

Num of images: 37000, Accuracy: 74.9 %


Classification evaluation:  76%|███████▌  | 9505/12500 [06:06<01:49, 27.45it/s]

Num of images: 38000, Accuracy: 74.68 %


Classification evaluation:  78%|███████▊  | 9754/12500 [06:15<01:47, 25.58it/s]

Num of images: 39000, Accuracy: 74.56 %


Classification evaluation:  80%|████████  | 10004/12500 [06:25<01:28, 28.31it/s]

Num of images: 40000, Accuracy: 74.41 %


Classification evaluation:  82%|████████▏ | 10253/12500 [06:34<01:18, 28.47it/s]

Num of images: 41000, Accuracy: 74.32 %


Classification evaluation:  84%|████████▍ | 10503/12500 [06:43<01:12, 27.64it/s]

Num of images: 42000, Accuracy: 74.1 %


Classification evaluation:  86%|████████▌ | 10754/12500 [06:53<01:13, 23.81it/s]

Num of images: 43000, Accuracy: 73.94 %


Classification evaluation:  88%|████████▊ | 11005/12500 [07:02<00:57, 25.99it/s]

Num of images: 44000, Accuracy: 73.9 %


Classification evaluation:  90%|█████████ | 11255/12500 [07:12<00:46, 26.69it/s]

Num of images: 45000, Accuracy: 73.76 %


Classification evaluation:  92%|█████████▏| 11505/12500 [07:21<00:35, 28.21it/s]

Num of images: 46000, Accuracy: 73.67 %


Classification evaluation:  94%|█████████▍| 11755/12500 [07:30<00:27, 27.12it/s]

Num of images: 47000, Accuracy: 73.65 %


Classification evaluation:  96%|█████████▌| 12003/12500 [07:39<00:18, 26.94it/s]

Num of images: 48000, Accuracy: 73.71 %


Classification evaluation:  98%|█████████▊| 12253/12500 [07:48<00:08, 27.94it/s]

Num of images: 49000, Accuracy: 73.47 %


Classification evaluation: 100%|██████████| 12500/12500 [07:57<00:00, 26.15it/s]

Num of images: 50000, Accuracy: 73.64 %
Quantized model Accuracy: 73.6400





\
Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License