# Mixed-Precision PTQ - Pytorch MobileNetV2 on CIFAR100

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb)

## Overview

This tutorial demonstrates the process of retraining and quantizing a MobileNetV2 on CIFAR100 dataset. It starts by fine-tuning a pretrained MobileNetV2 model on the CIFAR100 dataset. After retraining, the model is quantized using MCT. This tutorial specifically uses mixed-precision quantization, which assigns different precision levels to different layers in the model based on their impact on the output. The quantized model is then evaluated and exported to an ONNX file.

## Summary

In this tutorial we will cover:
1. Retraining Pytorch MobileNetV2 on CIFAR100.
2. Quantizing the model using post-training quantization in mixes-precision for the weights.
3. Evaluating and exporting the model to ONNX.

## Setup

First install the relevant packages and import them:

In [None]:
! pip install -q model-compression-toolkit
! pip install -q torch
! pip install -q torchvision

In [None]:
import copy
import tempfile

import torch
import torchvision
from torch import nn, optim
from torchvision import transforms
from tqdm import tqdm
import numpy as np
import random

import model_compression_toolkit as mct

In addition, let's set a seed for reproduction results purposes:

In [None]:
def seed_everything(seed_value):
    random.seed(seed_value)
    np.random.seed(seed_value)
    torch.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

seed_everything(0)


## Define functions for creating dataset loaders

We use two functions to create data loaders for the CIFAR100 dataset:

get_cifar100_trainloader - This function creates a data loader for the CIFAR100 training dataset, applying the specified transformations and using the provided batch size.

get_cifar100_testloader - Similarly, this function creates a data loader for the CIFAR100 testing dataset with the given transformations and batch size.

In [None]:

def get_cifar100_trainloader(dataset_folder, transform, train_batch_size):
    """
    Get CIFAR100 train loader.
    """
    trainset = torchvision.datasets.CIFAR100(root=dataset_folder, train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=train_batch_size, shuffle=True)
    return trainloader


def get_cifar100_testloader(dataset_folder, transform, eval_batch_size):
    """
    Get CIFAR100 test loader.
    """
    testset = torchvision.datasets.CIFAR100(root=dataset_folder, train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=eval_batch_size, shuffle=False)
    return testloader


## Evaluation helper function
Now, we will create a function for evaluating a model (we will use it later on).

In [None]:

def evaluate(model, testloader, device):
    """
    Evaluate a model using a test loader.
    """
    model.to(device)
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    val_acc = (100 * correct / total)
    print('Accuracy: %.2f%%' % val_acc)
    return val_acc

## Fine-tuning MobileNetV2 to CIFAR100

We now create a function for the retraining phase of our model. This is a simple training schema for 20 wpochs. The trained model is evaluated after each epoch and the returned model is the model with the best observed accuracy.

In [None]:
def retrain(model, transform, device, args):
    trainloader = get_cifar100_trainloader(args.representative_dataset_dir,
                                           transform,
                                           args.retrain_batch_size)

    testloader = get_cifar100_testloader(args.representative_dataset_dir,
                                         transform,
                                         args.eval_batch_size)

    model.to(device)

    # Define loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(),
                          lr=args.retrain_lr,
                          momentum=args.retrain_momentum)

    best_acc = 0.0
    # Training loop
    for epoch in range(args.retrain_num_epochs):
        prog_bar = tqdm(enumerate(trainloader),
                        total=len(trainloader),
                        leave=True)

        print(f'Retrain epoch: {epoch}')
        for i, data in prog_bar:
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward, backward, and update parameters
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

        val_acc = evaluate(model, testloader, device)

        # Check if this model has the best accuracy, and if so, save it
        if val_acc > best_acc:
            print(f'Best accuracy so far {val_acc}')
            best_acc = val_acc
            best_state_dict = copy.deepcopy(model.state_dict())

    model.load_state_dict(best_state_dict)
    return model

Let's create an object for the retraining parameters:

In [None]:
class RetrainArguments:
    def __init__(self):
        self.retrain_num_epochs = 20 # Number of epochs to retrain the model
        self.eval_batch_size = 32 # Batch size of test loader
        self.retrain_batch_size = 32 # Batch size of train loader
        self.retrain_lr = 0.001 # Learning rate to use during retraining
        self.retrain_momentum = 0.9 # SGD momentum to use during retraining
        self.representative_dataset_dir = './data' # Path to save the dataset (CIFAR100)

retrain_args = RetrainArguments()

In order to retrain MobileNetV2 we first load the ImageNet weights and then fine-tune it using the above-mentioned retraining function:

In [None]:
# Load pretrained MobileNetV2 model on ImageNet
model = torchvision.models.mobilenet_v2(pretrained=True)

# Modify last layer to match CIFAR-100 classes
model.classifier[1] = nn.Linear(model.last_channel, 100)

# Create preprocessing pipeline for training and evaluation
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to fit MobileNetV2 input
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # Normalize inputs to range [-1, 1]

# If GPU available, move the model to GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Fine-tune the model to adapt to CIFAR100
model = retrain(model,
                transform,
                device,
                retrain_args)

Finally, let's evaluate our new model:

In [None]:
# Evaluate the retrained model
testloader = get_cifar100_testloader(retrain_args.representative_dataset_dir,
                                     transform,
                                     retrain_args.eval_batch_size)
evaluate(model, testloader, device)

## Mixed-Precision Quantization Using MCT

Now we would like to quantize this model using MCT.
To do so, we need to define a representative dataset, which is a generator that returns a list of images for 10 times (in this example):

In [None]:
# Create representative_data_gen function from the train dataset
trainloader = get_cifar100_trainloader(retrain_args.representative_dataset_dir,
                                       transform,
                                       retrain_args.retrain_batch_size)

num_calibration_iterations = 10
def representative_data_gen() -> list:
    for _ in range(num_calibration_iterations):
        yield [next(iter(trainloader))[0]]

In addition, MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/experimental_api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:

In [None]:
# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference.
# Here, for example, we use the default platform that is attached to a Pytorch layers representation.
target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')

In order to use mixed-precision quantization we need to set some parameters in the CoreConfig that MCT uses:
1. Number of images - MCT uses images from the representative dataset to search for a suitable bit-width configuration. This parameter determine the number of images MCT will use. The more images, the bit-width configuration is expected to be more accurate (however this affects the search time, so there is a trade-off between runtime and expected accuracy).
2. Gradient weighting - A method to improve the bit-width configuration search (in exchange for longer search time). In this example, we will not use it.

In [None]:
# Create a mixed-precision quantization configuration with possible mixed-precision search options.
# MCT will search a mixed-precision configuration (namely, bit-width for each layer)
# and quantize the model according to this configuration.
# The candidates bit-width for quantization should be defined in the target platform model:
configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfigV2(
    num_of_images=32,
    use_grad_based_weights=False))

In addition, when using mixed-precision we define the desired compression ratio. Here, we will search for a mixed-precision configuration that will compress the weights to 0.75% of the 8bits model weights:

In [None]:
# Get KPI information to constraint your model's memory size.
# Retrieve a KPI object with helpful information of each KPI metric,
# to constraint the quantized model to the desired memory size.
kpi_data = mct.core.pytorch_kpi_data_experimental(model,
                                                  representative_data_gen,
                                                  configuration,
                                                  target_platform_capabilities=target_platform_cap)

# Set a constraint for each of the KPI metrics.
# Create a KPI object to limit our returned model's size. Note that this values affect only layers and attributes
# that should be quantized (for example, the kernel of Conv2D in Pytorch will be affected by this value,
# while the bias will not)
# examples:
# weights_compression_ratio = 0.75 - About 0.75 of the model's weights memory size when quantized with 8 bits.
kpi = mct.core.KPI(kpi_data.weights_memory * 0.75)

Now, we are ready to use MCT to quantize the model:

In [None]:
gptq_config = mct.gptq.get_pytorch_gptq_config(20) 
quantized_model, quantization_info = mct.gptq.pytorch_gradient_post_training_quantization_experimental(model, 
                                                                                               representative_data_gen, 
                                                                                               target_kpi=kpi, 
                                                                                               core_config=configuration, 
                                                                                               gptq_config=gptq_config, 
                                                                                               gptq_representative_data_gen=representative_data_gen, 
                                                                                               target_platform_capabilities=target_platform_cap) 
    

Finally, we evaluate the quantized model:

In [None]:
evaluate(quantized_model,
         testloader,
         device)

Now, we can export the quantized model to ONNX. Notice that onnx is not in MCT requierments, so first it should be installed:

In [None]:
! pip install -q onnx

In [None]:
# Export quantized model to ONNX
import tempfile
_, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model
mct.exporter.pytorch_export_model(model=quantized_model, save_model_path=onnx_file_path,
                                  repr_dataset=representative_data_gen, target_platform_capabilities=target_platform_cap,
                                  serialization_format=mct.exporter.PytorchExportSerializationFormat.ONNX)

## Conclusion



Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
