# Cross Layer Equalization and Bias Correction Example Code

This script utilizes AIMET to apply Cross Layer Equalization and Bias Correction on a resnet18.The general procedure for quantization is to optionally change the model through Cross-Layer Equalization and/or Bias Correction, then use AIMET's QuantizationSimModel to compute new encodings, then finetune the model. Here is an overview of each featue this notebook showcases.

**Cross Layer Equalization**  
1. Batch Norm Folding: accounting for the parameters of batch norm layers by changing the weights of the subsequent convolutional layers
2. Cross-Layer Scaling: rescaling the weights of consecutive convolutional layers to make them closer in magnitude to one another
3. High Bias Folding: redistributing the high biases of some layers to other layers with lower biases.

**Bias Correction**  
Bias Correction is used to make sure the mean of the outputs of a given layer is the same before and after the quantization step. This is done simply by taking the quantized bias and subtracting the expected difference between the outputs associated with the original weights and the quantized weights.


#### The example code shows the following:
1. Instantiate Data Pipeline for evaluation 
2. Load the pretrained resnet18 Pytorch model
3. Calculate Model accuracy
    * 3.1. Calculate floating point accuracy
    * 3.2. Calculate Quant Simulator accuracy
4. Apply AIMET CLE and BC
    * 4.1. Apply AIMET CLE and calculates QuantSim accuracy
    * 4.2. Apply AIMET BC and calculates QuantSim accuracy


In [1]:
import warnings
warnings.filterwarnings("ignore", ".*param.*")

import os
import copy
import argparse
from typing import List
from datetime import datetime
from functools import partial
import torch
from torchvision.models import resnet18

In [2]:
# AIMET Imports for Quantization
from aimet_common.defs import QuantScheme
from aimet_torch.quantsim import QuantizationSimModel, QuantParams
from aimet_torch.bias_correction import correct_bias
from aimet_torch.cross_layer_equalization import equalize_model
from aimet_torch.batch_norm_fold import fold_all_batch_norms


2022-01-19 15:38:15,510 - root - INFO - aimetpro-1.20.0_Build_Id_0.139.0.1838.torch-gpu-universal


In [3]:
# Data Pipeline Imports
from Examples.common import image_net_config
from Examples.torch.utils.image_net_evaluator import ImageNetEvaluator
from Examples.torch.utils.image_net_trainer import ImageNetTrainer
from Examples.torch.utils.image_net_evaluator import ImageNetDataLoader

## Setting Up Our Config Dictionary

The config dictionary specifies a number of things 

config: 
This mapping expects following parameters:
1. **dataset_dir:** Path to a directory containing ImageNet dataset. This folder should contain subfolders 'train' for training dataset and 'val' for validation dataset.
3. **use_cuda:** A boolean var to indicate to run the quantization on GPU.
4. **logdir:** Path to a directory for logging.

To get a better understanding of when each of the parameters in the config dictionary is used, read the code in those cells.  
**Note:** You will have to replace the dataset_dir path with the path to your own imagenet/tinyimagenet dataset

In [None]:
config = {'dataset_dir': "path/to/dataset",
          'use_cuda': True,
          'logdir': os.path.join("benchmark_output", "cle_bc_"+datetime.now().strftime("%Y-%m-%d-%H-%M-%S"))}

os.makedirs(config['logdir'], exist_ok=True)

## 1. Instantiate Data Pipeline

The ImageNetDataPipeline class takes care of evaluating a model using a dataset directory. For more detail on how it works, see the relevant files under examples/torch/utils.

The data pipeline class is simply a template for the user to follow. The methods for this class can be replaced by the user to fit their needs.

In [5]:
class ImageNetDataPipeline:
    """
    Provides APIs for model quantization using evaluation and finetuning.
    """

    def __init__(self, config):
        """
        :param config:
        """
        self._config = config

    def data_loader(self):
        """
        :return: ImageNetDataloader
        """
        
        data_loader = ImageNetDataLoader(is_training=False, images_dir=self._config["dataset_dir"],
                                         image_size=image_net_config.dataset['image_size']).data_loader

        return data_loader
    
    def evaluate(self, model: torch.nn.Module, iterations: int = None, use_cuda: bool = False) -> float:
        """
        Evaluate the specified model using the specified number of samples from the validation set.
        :param model: The model to be evaluated.
        :param iterations: The number of batches of the dataset.
        :param use_cuda: If True then use a GPU for inference.
        :return: The accuracy for the sample with the maximum accuracy.
        """

        # Your code goes here

        evaluator = ImageNetEvaluator(self._config['dataset_dir'], image_size=image_net_config.dataset['image_size'],
                                      batch_size=image_net_config.evaluation['batch_size'],
                                      num_workers=image_net_config.evaluation['num_workers'])

        return evaluator.evaluate(model, iterations, use_cuda)

## 2. Load the Model, Initialize DataPipeline

The next section will initialize the model and data pipeline for the quantization

We initialize the pipeline and the model. Before quantizing the model, we calculate the original floating point (FP32) accuracy of the model on the dataset provided.

In [6]:
data_pipeline = ImageNetDataPipeline(config)

model = resnet18(pretrained=True)
if config['use_cuda']:
    if torch.cuda.is_available():
        model.to(torch.device('cuda'))
    else:
        raise Exception("use_cuda is True but cuda is unavailable")
model.eval()

accuracy = data_pipeline.evaluate(model, use_cuda=config['use_cuda'])
print("Original Model Accuracy: ", accuracy)

2022-01-19 15:38:22,046 - Dataloader - INFO - Dataset consists of 1000 images in 1000 classes
2022-01-19 15:38:22,051 - Eval - INFO - No value of iteration is provided, running evaluation on complete dataset.
2022-01-19 15:38:22,052 - Eval - INFO - Evaluating nn.Module for 4 iterations with batch_size 256


  cpuset_checked))
100% (4 of 4) |##########################| Elapsed Time: 0:00:01 Time:  0:00:01


2022-01-19 15:38:26,540 - Eval - INFO - Avg accuracy Top 1: 68.655710 Avg accuracy Top 5: 88.574219 on validation Dataset
Original Model Accuracy:  68.65571022033691


## 3. Quantization Simulator

The next cells are for the actual quantization step. The quantization parameters are specified in the following cell:

1. **quant_scheme**: The scheme used to quantize the model. We can choose from s - post_training_tf or post_training_tf_enhanced.

2. **rounding_mode**: The rounding mode used for quantization. There are two possible choices here - 'nearest' or 'stochastic'

3. **default_output_bw**: The bitwidth of the activation tensors. The value of this should be a power of 2, less than 32.

4. **default_param_bw**: The bidwidth of the parameter tensors. The value of this should be a power of 2, less than 32.

5. **num_batches**: The number of batches used to evaluate the model while calculating the quantization encodings.Number of batches to use for computing encodings. Only 5 batches are used here to speed up the process. In addition, the number of images in these 5 batches should be sufficient for compute encodings

In [7]:
quant_scheme = QuantScheme.post_training_tf_enhanced

rounding_mode = 'nearest'

default_output_bw = 8

default_param_bw = 8

#Uncomment one of the following lines
# num_batches = 5 #Typical
num_batches = 1 #Test

We now set up the quantization simulator, and quantize the model. The resulting quantized (INT8) Model is then evaluated on the dataset. We utilize the evaluate function from the data pipeline to compute the new weights.

it is customary to fold batch norms; however, the Cross Layer Equalization API expects a model which does not have folded batch norms. For this reason, we make a copy of our model to evaluate.

In [None]:
dummy_input = torch.rand(1, 3, 224, 224)
if config['use_cuda']:
    dummy_input = dummy_input.to(torch.device('cuda'))


BN_folded_model = copy.deepcopy(model)
_ = fold_all_batch_norms(BN_folded_model, input_shapes=(1, 3, 224, 224))

quantizer = QuantizationSimModel(model=BN_folded_model,
                                 quant_scheme=quant_scheme,
                                 dummy_input=dummy_input,
                                 rounding_mode=rounding_mode,
                                 default_output_bw=default_output_bw,
                                 default_param_bw=default_param_bw)

quantizer.compute_encodings(forward_pass_callback=partial(data_pipeline.evaluate,
                                                          use_cuda=config['use_cuda']),
                            forward_pass_callback_args=num_batches)

# Calculate quantized (INT8) accuracy after CLE
accuracy = data_pipeline.evaluate(quantizer.model)
print("Quantized (INT8) Model Top-1 Accuracy: ", accuracy)

2022-01-19 15:38:26,875 - Utils - INFO - ...... subset to store [Conv_0, BatchNormalization_1]
2022-01-19 15:38:26,876 - Utils - INFO - ...... subset to store [Conv_4, BatchNormalization_5]
2022-01-19 15:38:26,876 - Utils - INFO - ...... subset to store [Conv_7, BatchNormalization_8]
2022-01-19 15:38:26,877 - Utils - INFO - ...... subset to store [Conv_11, BatchNormalization_12]
2022-01-19 15:38:26,877 - Utils - INFO - ...... subset to store [Conv_14, BatchNormalization_15]
2022-01-19 15:38:26,878 - Utils - INFO - ...... subset to store [Conv_18, BatchNormalization_19]
2022-01-19 15:38:26,878 - Utils - INFO - ...... subset to store [Conv_21, BatchNormalization_22]
2022-01-19 15:38:26,878 - Utils - INFO - ...... subset to store [Conv_27, BatchNormalization_28]
2022-01-19 15:38:26,879 - Utils - INFO - ...... subset to store [Conv_30, BatchNormalization_31]
2022-01-19 15:38:26,879 - Utils - INFO - ...... subset to store [Conv_34, BatchNormalization_35]
2022-01-19 15:38:26,879 - Utils - IN

100% (1 of 1) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00


2022-01-19 15:38:51,812 - Eval - INFO - Avg accuracy Top 1: 75.390625 Avg accuracy Top 5: 94.531250 on validation Dataset
2022-01-19 15:38:52,719 - Dataloader - INFO - Dataset consists of 1000 images in 1000 classes
2022-01-19 15:38:52,721 - Eval - INFO - No value of iteration is provided, running evaluation on complete dataset.
2022-01-19 15:38:52,722 - Eval - INFO - Evaluating nn.Module for 4 iterations with batch_size 256


 50% (2 of 4) |#############             | Elapsed Time: 0:00:21 ETA:   0:00:21

## 4. 1 Cross Layer Equalization

The next cell performs cross-layer equalization on the model. As noted before, the function folds batch norms, applies cross-layer scaling, and then folds high biases.

In [None]:
# This API will equalize the model in-place
equalize_model(model, input_shapes=(1, 3, 224, 224))

Then, the model is quantized, and the accuracy is noted. This is done before the bias correction step in order to measure the individual impacts of each technique.

In [None]:
dummy_input = torch.rand(1, 3, 224, 224)
if config['use_cuda']:
    dummy_input = dummy_input.to(torch.device('cuda'))

cle_quantizer = QuantizationSimModel(model=model,
                                     quant_scheme=quant_scheme,
                                     dummy_input=dummy_input,
                                     rounding_mode=rounding_mode,
                                     default_output_bw=default_output_bw,
                                     default_param_bw=default_param_bw)

cle_quantizer.compute_encodings(forward_pass_callback=partial(data_pipeline.evaluate,
                                                              use_cuda=config['use_cuda']),
                                forward_pass_callback_args=num_batches)

accuracy = data_pipeline.evaluate(cle_quantizer.model)
print("CLE applied Model Top-1 accuracy on Quant Simulator: ", accuracy)

## 4. 2 Bias Correction

Perform Bias correction and calculate the accuracy on the quantsim model. The first cell includes two parameters related to this step:

1. **num_quant_samples**: The number of samples used during quantization
2. **num_bias_correction_samples**: The number of samples used during bias correction

In [None]:
# Uncomment one of the following sets of parameters
num_quant_samples = 16 #Typical
num_bias_correct_samples = 16 #Typical

num_quant_samples = 1 #Test
num_bias_correct_samples = 1 #Test

Here the actual bias correction steps are performed:

In [None]:
data_loader = data_pipeline.data_loader()

bc_params = QuantParams(weight_bw=default_param_bw,
                        act_bw=default_output_bw,
                        round_mode=rounding_mode,
                        quant_scheme=quant_scheme)

correct_bias(model,
             bc_params,
             num_quant_samples=num_quant_samples,
             data_loader=data_loader,
             num_bias_correct_samples=num_bias_correct_samples)

Finally, the model is quantized, the accuracy is logged, and the model is saved.

In [None]:
dummy_input = torch.rand(1, 3, 224, 224)
if config['use_cuda']:
    dummy_input = dummy_input.to(torch.device('cuda'))

bc_quantizer = QuantizationSimModel(model=model,
                                    quant_scheme=quant_scheme,
                                    dummy_input=dummy_input,
                                    rounding_mode=rounding_mode,
                                    default_output_bw=default_output_bw,
                                    default_param_bw=default_param_bw,
                                    in_place=False)

bc_quantizer.compute_encodings(forward_pass_callback=partial(data_pipeline.evaluate,
                                                             use_cuda=config['use_cuda']),
                               forward_pass_callback_args=num_batches)

accuracy = data_pipeline.evaluate(bc_quantizer.model)
print("Quantized (INT8) Model Top-1 Accuracy After Bias Correction: ", accuracy)

torch.save(model, os.path.join(config['logdir'], 'quantized_model.pth'))

## Resources

For more information on how Cross Layer Equalization and Bias Correction works, be sure to check out this [page](https://quic.github.io/aimet-pages/AimetDocs/user_guide/post_training_quant_techniques.html#ug-post-training-quantization) on post-training quantization techniques and this [paper](https://arxiv.org/abs/1906.04721) on Cross Layer Equalization and Bias Correction.

For more information about AIMET's APIs, visit the [documentation](https://quic.github.io/aimet-pages/AimetDocs/api_docs/torch_quantization.html) on Torch Model Quantization. For a better understanding on what AIMET has to offer, be sure to check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLd0XF75dq-1a7OZTl1kAiM2ZqeKqQpKFH), and this [page](https://quic.github.io/aimet-pages/index.html) on AIMET.