# Cross-Layer Equalization (CLE) and Bias Correction (BC)

This notebook showcases a working code example of how to use AIMET to apply Cross-Layer Equalization (CLE) and Bias Correction (BC). CLE and BC are post-training quantization techniques that aim to improve quantized accuracy of a given model. CLE does not need any data samples. BC may optionally need unlabelled data samples. These techniques help recover quantized accuracy when the model quantization is sensitive to parameter quantization as opposed to activation quantization.

To learn more about this techniques, please refer to the "Data-Free Quantization Through Weight Equalization and Bias Correction" paper from ICCV 2019 - https://arxiv.org/abs/1906.04721

**Cross-Layer Equalization**
AIMET performs the following steps when running CLE:
1. Batch Norm Folding: Folds BN layers into Conv layers immediate before or after the Conv layers.
2. Cross-Layer Scaling: Given a set of consecutive Conv layers, equalizes the range of tensor values per-channel by scaling up/down per-channel weight tensor values of a layer and corresponding scaling down/up per-channel weight tensor values of the subsequent layer.
3. High Bias Folding: Cross-layer scaling may result in high bias parameter values for some layers. This technique folds some of the bias of a layer into the subsequent layer's parameters.

**Bias Correction**  
Quantization sometimes leads to a shift in layer outputs. This techniques helps correct this shift by adjusting the bias parameters of that layer. Note that this technique is generally applied after CLE, but it is a optional step.


#### Overall flow
This notebook covers the following
1. Instantiate the example evaluation and training pipeline
2. Load the FP32 model and evaluate the model to find the baseline FP32 accuracy
3. Create a quantization simulation model (with fake quantization ops inserted) and evaluate this simuation model to get a quantized accuracy score
4. Apply CLE, BC and and evaluate the simulation model to get a post-finetuned quantized accuracy score


#### What this notebook is not
* This notebook is not designed to show state-of-the-art results. For example, it uses a relatively quantization-friendly model like Resnet18. Also, some optimization parameters are deliberately chosen to have the notebook execute more quickly.


In [1]:
import cv2
import os
import torch
import json
import numpy as np
from tqdm.notebook import tqdm

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

from mmcv.transforms import Compose
from mmdet.utils import get_test_pipeline_cfg

def read_json(json_path):
    with open(json_path) as f:
        data = json.load(f)
    return data

def read_txt(txt_path):
    with open(txt_path) as f:
        data = f.readlines()
    data = [x.strip() for x in data]
    return data

def preprocess(test_pipeline, image):
    if isinstance(image, np.ndarray):
        # Calling this method across libraries will result
        # in module unregistered error if not prefixed with mmdet.
        test_pipeline[0].type = 'mmdet.LoadImageFromNDArray'
    test_pipeline = Compose(test_pipeline)
    return test_pipeline(dict(img=image))

class CustomImageDataset(torch.utils.data.Dataset):
    def __init__(self, images_dir, annotations_json_path, transform=None):
        self.transform = transform
        self.images_dir = images_dir
        self.annotations_json = read_json(annotations_json_path)


    def __len__(self):
        return len(self.annotations_json['images'])

    def __getitem__(self, idx):
        image_dict = self.annotations_json['images'][idx]
        image_path = os.path.join(self.images_dir, image_dict['file_name'])
        image_id = image_dict['id']

        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        if self.transform:
            transformed_images = self.transform(image)
        else:
            transformed_images = image

        return image_id, image_path, transformed_images


# calibrationDataloader = DataLoader(calibrationDataset, batch_size=32, shuffle=True)

In [2]:
import torch
from mmdet.apis import DetInferencer

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize([640, 640]),  # Resize
])

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
CONFIG_PATH = '/teamspace/studios/this_studio/mmdetection/rtmdet_tiny_8xb32-300e_coco.py'
WEIGHTS_PATH = '/teamspace/studios/this_studio/mmdetection/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'
EVAL_DATASET_SIZE = 5000
CALIBRATION_DATASET_SIZE = 1000
BATCH_SIZE = 64

ROOT_DATASET_DIR = '/teamspace/studios/this_studio/COCO'
IMAGES_DIR = os.path.join(ROOT_DATASET_DIR, 'images')
ANNOTATIONS_JSON_PATH = os.path.join(ROOT_DATASET_DIR, 'annotations/instances_val2017.json')
# ANNOTATIONS_JSON_PATH = "/home/shayaan/Desktop/aimet/my_mmdet/temp.json"

model = DetInferencer(model=CONFIG_PATH, weights=WEIGHTS_PATH, device=DEVICE)
evalDataset = CustomImageDataset(images_dir=IMAGES_DIR, annotations_json_path=ANNOTATIONS_JSON_PATH, transform=transform)
eval_data_loader = DataLoader(evalDataset, batch_size=BATCH_SIZE)
calibration_images = read_txt('/teamspace/studios/this_studio/aimet/Examples/torch/quantization/calibration_image_ids.txt')
calibration_data_loader = DataLoader(calibration_images, batch_size=BATCH_SIZE)

DEVICE

[2024-09-10 05:44:45,038] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)


/bin/ld: cannot find -laio: No such file or directory
collect2: error: ld returned 1 exit status


Loads checkpoint by local backend from path: /teamspace/studios/this_studio/mmdetection/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: data_preprocessor.mean, data_preprocessor.std





device(type='cuda', index=0)

In [4]:
from collections import OrderedDict
from copy import deepcopy

m = deepcopy(model.model)

def is_leaf(module): 
    return len(module._modules) == 0

def replace_bn(m):

    if is_leaf(m):
        return 

    for _, child in m.named_children(): 
        
        if "bn" in child._modules.keys():
            bn = child._modules.get("bn")
            bn_params = deepcopy(bn._parameters)
            bn_buffers = deepcopy(bn._buffers)
            new_bn = torch.nn.BatchNorm2d(bn.num_features, eps=bn.eps, momentum=bn.momentum, affine=bn.affine, track_running_stats=bn.track_running_stats)
            new_bn._parameters["weight"].data = bn_params["weight"].data
            new_bn._parameters["bias"].data = bn_params["bias"].data
            new_bn._buffers["running_mean"].data = bn_buffers["running_mean"].data
            new_bn._buffers["running_var"].data = bn_buffers["running_var"].data
            new_bn._buffers["num_batches_tracked"].data = bn_buffers["num_batches_tracked"].data
            child._modules["bn"] = new_bn
            
        replace_bn(child)

from aimet_torch.batch_norm_fold import fold_all_batch_norms
from aimet_torch.model_preparer import prepare_model
replace_bn(m)
aimet_m = prepare_model(deepcopy(m))
folded_pairs = fold_all_batch_norms(aimet_m, input_shapes=(1, 3, 640, 640))
len(folded_pairs)
# print(m)
# m(torch.rand(1, 3, 640, 640).to(DEVICE))[0][0].shape

2024-09-10 05:44:49,094 - root - INFO - AIMET
2024-09-10 05:44:54,998 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage1.1.blocks.0.module_add} 
2024-09-10 05:44:54,999 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage1.1.module_cat} 
2024-09-10 05:44:55,000 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage1.1.attention.module_mul} 
2024-09-10 05:44:55,000 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage2.1.blocks.0.module_add_1} 
2024-09-10 05:44:55,001 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage2.1.module_cat_1} 
2024-09-10 05:44:55,001 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage2.1.attention.module_mul_1} 
2024-09-10 05:44:55,002 - ModelPreparer - INFO - Functional         : Adding new module for node: {backbone.stage3.1.blocks.0.modu

76

In [3]:
from tqdm.notebook import tqdm
import torch

from mmdet.models.utils import samplelist_boxtype2tensor
from mmengine.registry import MODELS
from mmcv.transforms import Compose

test_evaluator = model.cfg.test_evaluator
test_evaluator.type = 'mmdet.evaluation.CocoMetric' 
test_evaluator.dataset_meta = model.model.dataset_meta
test_evaluator.ann_file = ANNOTATIONS_JSON_PATH
test_evaluator = Compose(test_evaluator)

collate_preprocessor = model.preprocess
predict_by_feat = model.model.bbox_head.predict_by_feat
rescale = True

preprocessor = MODELS.build(model.cfg.model.data_preprocessor)
def add_pred_to_datasample(data_samples, results_list):
    for data_sample, pred_instances in zip(data_samples, results_list):
        data_sample.pred_instances = pred_instances
    samplelist_boxtype2tensor(data_samples)
    return data_samples

loading annotations into memory...
Done (t=0.67s)
creating index...
index created!


In [5]:
def pass_calibration_data(model: torch.nn.Module, samples: int):
    data_loader = eval_data_loader
    batch_size = data_loader.batch_size
    model.eval()
    batch_ctr = 0
    with torch.no_grad():
        for image_path in tqdm(calibration_data_loader):
            image_path = [os.path.join(IMAGES_DIR, x) for x in image_path]
            pre_processed = collate_preprocessor(inputs=image_path, batch_size=batch_size)
            _, data = list(pre_processed)[0]
            data = preprocessor(data, False)
            
            preds = model(data['inputs'].to(DEVICE))  

            # batch_ctr += 1
            # if (batch_ctr * batch_size) > samples:
            #     break

In [6]:
from aimet_torch.cross_layer_equalization import equalize_model, equalize_bn_folded_model

equalize_bn_folded_model(aimet_m, input_shapes=(1, 3, 640, 640), folded_pairs=folded_pairs, dummy_input=torch.rand(1, 3, 640, 640))

input_nodes=[Conv_0]
Visiting node: GraphModule.backbone.stem.0.conv
Visiting node: GraphModule.CG_Split_0
Visiting node: GraphModule.backbone.stem.0.activate.sigmoid
Visiting node: GraphModule.backbone.stem.0.activate.mul
Visiting node: GraphModule.backbone.stem.1.conv
Visiting node: GraphModule.CG_Split_1
Visiting node: GraphModule.backbone.stem.1.activate.sigmoid
Visiting node: GraphModule.backbone.stem.1.activate.mul
Visiting node: GraphModule.backbone.stem.2.conv
Visiting node: GraphModule.CG_Split_2
Visiting node: GraphModule.backbone.stem.2.activate.sigmoid
Visiting node: GraphModule.backbone.stem.2.activate.mul
Visiting node: GraphModule.backbone.stage1.0.conv
Visiting node: GraphModule.CG_Split_3
Visiting node: GraphModule.backbone.stage1.0.activate.sigmoid
Visiting node: GraphModule.backbone.stage1.0.activate.mul
Visiting node: GraphModule.CG_Split_4
Visiting node: GraphModule.backbone.stage1.1.short_conv.conv
Visiting node: GraphModule.CG_Split_5
Visiting node: GraphModule.b

AIMET quantization simulation requires the user's model definition to follow certain guidelines. For example, functionals defined in forward pass should be changed to equivalent torch.nn.Module.
AIMET user guide lists all these guidelines.
The following **ModelPreparer** API uses new graph transformation feature available in PyTorch 1.9+ version and automates model definition changes required to comply with the above guidelines. 

---
We should decide whether to place the model on a CPU or CUDA device. This example code will use CUDA if available in your current execution environment. You can change this logic and force a device placement if needed.

In [17]:
use_cuda = False
if torch.cuda.is_available():
    use_cuda = True
    model.to(torch.device('cuda'))
    
use_cuda

True

---
Let's determine the FP32 (floating point 32-bit) accuracy of this model using the evaluate() routine

In [18]:
accuracy = ImageNetDataPipeline.evaluate(model, use_cuda)
print(accuracy)

2024-09-09 08:05:35,560 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:05:35,572 - Eval - INFO - No value of iteration is provided, running evaluation on complete dataset.
2024-09-09 08:05:35,579 - Eval - INFO - Evaluating nn.Module for 313 iterations with batch_size 32


  0% (0 of 313) |                        | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (2 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:38
  0% (3 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:01:00
  1% (4 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:01:08
  1% (5 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:12
  1% (6 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:15
  2% (7 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:17
  2% (8 of 313) |                        | Elapsed Time: 0:00:02 ETA:   0:01:18
  2% (9 of 313) |                        | Elapsed Time: 0:00:02 ETA:   0:01:20
  3% (10 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:01:21
  3% (11 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:01:21
  3% (12 of 313) |                       | Elapsed Time: 0:00:03 ETA:   0:01:23
  4% (13 of 313) |                      

2024-09-09 08:06:51,073 - Eval - INFO - Avg accuracy Top 1: 71.285942 Avg accuracy Top 5: 90.335463 on validation Dataset
71.28594249201278


---
## 3. Create a quantization simulation model and determine quantized accuracy

## Fold Batch Normalization layers
Before we determine the simulated quantized accuracy using QuantizationSimModel, we will fold the BatchNormalization (BN) layers in the model. These layers get folded into adjacent Convolutional layers. The BN layers that cannot be folded are left as they are.

**Why do we need to this?**
On quantized runtimes (like TFLite, SnapDragon Neural Processing SDK, etc.), it is a common practice to fold the BN layers. Doing so, results in an inferences/sec speedup since unnecessary computation is avoided. Now from a floating point compute perspective, a BN-folded model is mathematically equivalent to a model with BN layers from an inference perspective, and produces the same accuracy. However, folding the BN layers can increase the range of the tensor values for the weight parameters of the adjacent layers. And this can have a negative impact on the quantized accuracy of the model (especially when using INT8 or lower precision). So, we want to simulate that on-target behavior by doing BN folding here.

The following code calls AIMET to fold the BN layers in-place on the given model

In [19]:
from aimet_torch.batch_norm_fold import fold_all_batch_norms

_ = fold_all_batch_norms(model, input_shapes=(1, 3, 224, 224))

2024-09-09 08:07:10,439 - BatchNormFolding - INFO - 0 BatchNorms' weights got converted


---
## Create Quantization Sim Model
Now we use AIMET to create a QuantizationSimModel. This basically means that AIMET will insert fake quantization ops in the model graph and will configure them.
A few of the parameters are explained here
- **quant_scheme**: We set this to "QuantScheme.post_training_tf_enhanced"
    - Supported options are 'tf_enhanced' or 'tf' or using Quant Scheme Enum QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced
- **default_output_bw**: Setting this to 8, essentially means that we are asking AIMET to perform all activation quantizations in the model using integer 8-bit precision
- **default_param_bw**: Setting this to 8, essentially means that we are asking AIMET to perform all parameter quantizations in the model using integer 8-bit precision
- **num_batches**: The number of batches used to evaluate the model while calculating the quantization encodings.Number of batches to use for computing encodings. Only 5 batches are used here to speed up the process. In addition, the number of images in these 5 batches should be sufficient for compute encodings
- **rounding_mode**: The rounding mode used for quantization. There are two possible choices here - 'nearest' or 'stochastic' We will use "nearest."

There are other parameters that are set to default values in this example. Please check the AIMET API documentation of QuantizationSimModel to see reference documentation for all the parameters.

In [20]:
from aimet_common.defs import QuantScheme
from aimet_torch.quantsim import QuantizationSimModel

dummy_input = torch.rand(1, 3, 224, 224)    # Shape for each ImageNet sample is (3 channels) x (224 height) x (224 width)
if use_cuda:
    dummy_input = dummy_input.cuda()

sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf_enhanced,
                           dummy_input=dummy_input,
                           default_output_bw=8,
                           default_param_bw=8)

2024-09-09 08:07:11,769 - Quant - INFO - No config file provided, defaulting to config file at /usr/local/lib/python3.10/dist-packages/aimet_common/quantsim_config/default_config.json
2024-09-09 08:07:11,820 - Quant - INFO - Unsupported op type Squeeze
2024-09-09 08:07:11,821 - Quant - INFO - Unsupported op type Mean
2024-09-09 08:07:11,829 - Quant - INFO - Selecting DefaultOpInstanceConfigGenerator to compute the specialized config. hw_version:default


---
We can check the modifications AIMET has made to the model graph. One way is to print the model, and we can see that AIMET has added quantization wrapper layers. Note: use sim.model to access the modified PyTorch model. By default, AIMET creates a copy of the original model prior to modifying it. There is a parameter to override this behavior.

In [21]:
print(sim.model)

GraphModule(
  (features): Module(
    (0): Module(
      (0): StaticGridQuantWrapper(
        (_module_to_wrap): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      )
      (1): Identity()
      (2): StaticGridQuantWrapper(
        (_module_to_wrap): ReLU6(inplace=True)
      )
    )
    (1): Module(
      (conv): Module(
        (0): Module(
          (0): StaticGridQuantWrapper(
            (_module_to_wrap): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
          )
          (1): Identity()
          (2): StaticGridQuantWrapper(
            (_module_to_wrap): ReLU6(inplace=True)
          )
        )
        (1): StaticGridQuantWrapper(
          (_module_to_wrap): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1))
        )
        (2): Identity()
      )
    )
    (2): Module(
      (conv): Module(
        (0): Module(
          (0): StaticGridQuantWrapper(
            (_module_to_wrap): Conv2d(16, 96, kernel_size=(1, 1), stride=(1

---
We can also check how AIMET has configured the added fake quantization nodes, which AIMET refers to as 'quantizers'. You can see this by printing the sim object.

In [22]:
print(sim)

-------------------------
Quantized Model Report
-------------------------
----------------------------------------------------------
Layer: features.0.0
  Input[0]: bw=8, encoding-present=False
  -------
  Param[weight]: bw=8, encoding-present=False
  -------
  Param[bias]: Not quantized
  -------
  Output[0]: Not quantized
  -------
----------------------------------------------------------
Layer: features.0.2
  Input[0]: Not quantized
  -------
  Output[0]: bw=8, encoding-present=False
  -------
----------------------------------------------------------
Layer: features.1.conv.0.0
  Input[0]: Not quantized
  -------
  Param[weight]: bw=8, encoding-present=False
  -------
  Param[bias]: Not quantized
  -------
  Output[0]: Not quantized
  -------
----------------------------------------------------------
Layer: features.1.conv.0.2
  Input[0]: Not quantized
  -------
  Output[0]: bw=8, encoding-present=False
  -------
----------------------------------------------------------
Layer: fe

---
Even though AIMET has added 'quantizer' nodes to the model graph but the model is not ready to be used yet. Before we can use the sim model for inference or training, we need to find appropriate scale/offset quantization parameters for each 'quantizer' node. For activation quantization nodes, we need to pass unlabeled data samples through the model to collect range statistics which will then let AIMET calculate appropriate scale/offset quantization parameters. This process is sometimes referred to as calibration. AIMET simply refers to it as 'computing encodings'.

So we create a routine to pass unlabeled data samples through the model. This should be fairly simple - use the existing train or validation data loader to extract some samples and pass them to the model. We don't need to compute any loss metric etc. So we can just ignore the model output for this purpose. A few pointers regarding the data samples

In practice, we need a very small percentage of the overall data samples for computing encodings. For example, the training dataset for ImageNet has 1M samples. For computing encodings we only need 500 or 1000 samples.
It may be beneficial if the samples used for computing encoding are well distributed. It's not necessary that all classes need to be covered etc. since we are only looking at the range of values at every layer activation. However, we definitely want to avoid an extreme scenario like all 'dark' or 'light' samples are used - e.g. only using pictures captured at night might not give ideal results.
The following shows an example of a routine that passes unlabeled samples through the model for computing encodings. This routine can be written in many different ways, this is just an example.

---
Now we call AIMET to use the above routine to pass data through the model and then subsequently compute the quantization encodings. Encodings here refer to scale/offset quantization parameters.

In [24]:
sim.compute_encodings(forward_pass_callback=pass_calibration_data,
                      forward_pass_callback_args=use_cuda)

2024-09-09 08:07:12,195 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes


---
Now the QuantizationSim model is ready to be used for inference or training. First we can pass this model to the same evaluation routine we used before. The evaluation routine will now give us a simulated quantized accuracy score for INT8 quantization instead of the FP32 accuracy score we saw before.

In [25]:
accuracy = ImageNetDataPipeline.evaluate(sim.model, use_cuda)
print(accuracy)

2024-09-09 08:07:33,552 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:07:33,555 - Eval - INFO - No value of iteration is provided, running evaluation on complete dataset.
2024-09-09 08:07:33,556 - Eval - INFO - Evaluating nn.Module for 313 iterations with batch_size 32


  0% (0 of 313) |                        | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (2 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:48
  0% (3 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:01:05
  1% (4 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:01:08
  1% (5 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:12
  1% (6 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:16
  2% (7 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:17
  2% (8 of 313) |                        | Elapsed Time: 0:00:02 ETA:   0:01:20
  2% (9 of 313) |                        | Elapsed Time: 0:00:02 ETA:   0:01:21
  3% (10 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:01:22
  3% (11 of 313) |                       | Elapsed Time: 0:00:03 ETA:   0:01:23
  3% (12 of 313) |                       | Elapsed Time: 0:00:03 ETA:   0:01:26
  4% (13 of 313) |                      

2024-09-09 08:09:08,906 - Eval - INFO - Avg accuracy Top 1: 67.711661 Avg accuracy Top 5: 87.649760 on validation Dataset
67.71166134185303


---
## 4. 1 Cross Layer Equalization

The next cell performs cross-layer equalization on the model. As noted before, the function folds batch norms, applies cross-layer scaling, and then folds high biases.

**Note:** Interestingly, CLE needs BN statistics for its procedure. If a BN folded model is provided, CLE will run the CLS (cross-layer scaling) optimization step but will skip the HBA (high-bias absorption) step. To avoid this, we simply load the original model again before running CLE.

**Note:** CLE equalizes the model in-place

In [26]:
model = mobilenet_v2(weights=MobileNet_V2_Weights.IMAGENET1K_V1)
model = prepare_model(model)

use_cuda = False
if torch.cuda.is_available():
    use_cuda = True
    model.to(torch.device('cuda'))

2024-09-09 08:09:09,152 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.3.module_add} 
2024-09-09 08:09:09,154 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.5.module_add_1} 
2024-09-09 08:09:09,158 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.6.module_add_2} 
2024-09-09 08:09:09,160 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.8.module_add_3} 
2024-09-09 08:09:09,161 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.9.module_add_4} 
2024-09-09 08:09:09,162 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.10.module_add_5} 
2024-09-09 08:09:09,163 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.12.module_add_6} 
2024-09-09 08:09:09,164 - ModelPreparer - INFO - Functional         : Adding new module for node: {features.13.module_add_7} 

In [27]:
from aimet_torch.cross_layer_equalization import equalize_model

equalize_model(model, input_shapes=(1, 3, 224, 224))

2024-09-09 08:09:13,506 - BatchNormFolding - INFO - 0 BatchNorms' weights got converted


---
Now, we can determine the simulated quantized accuracy of the equalized model. We again create a simulation model like before and evaluate to determine simulated quantized accuracy.

In [28]:
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf,
                           dummy_input=dummy_input,
                           default_output_bw=8,
                           default_param_bw=8)

sim.compute_encodings(forward_pass_callback=pass_calibration_data,
                      forward_pass_callback_args=use_cuda)

accuracy = ImageNetDataPipeline.evaluate(sim.model, use_cuda)
print(accuracy)

2024-09-09 08:09:20,802 - Quant - INFO - No config file provided, defaulting to config file at /usr/local/lib/python3.10/dist-packages/aimet_common/quantsim_config/default_config.json
2024-09-09 08:09:20,849 - Quant - INFO - Unsupported op type Squeeze
2024-09-09 08:09:20,850 - Quant - INFO - Unsupported op type Mean
2024-09-09 08:09:20,857 - Quant - INFO - Selecting DefaultOpInstanceConfigGenerator to compute the specialized config. hw_version:default
2024-09-09 08:09:22,086 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:09:33,505 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:09:33,508 - Eval - INFO - No value of iteration is provided, running evaluation on complete dataset.
2024-09-09 08:09:33,509 - Eval - INFO - Evaluating nn.Module for 313 iterations with batch_size 32


  0% (0 of 313) |                        | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (2 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:46
  0% (3 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:01:14
  1% (4 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:01:16
  1% (5 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:14
  1% (6 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:17
  2% (7 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:01:18
  2% (8 of 313) |                        | Elapsed Time: 0:00:02 ETA:   0:01:19
  2% (9 of 313) |                        | Elapsed Time: 0:00:02 ETA:   0:01:19
  3% (10 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:01:20
  3% (11 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:01:20
  3% (12 of 313) |                       | Elapsed Time: 0:00:03 ETA:   0:01:20
  4% (13 of 313) |                      

2024-09-09 08:11:05,031 - Eval - INFO - Avg accuracy Top 1: 70.207668 Avg accuracy Top 5: 89.656550 on validation Dataset
70.2076677316294


---
## 4. 2 Bias Correction

This section shows how we can apply AIMET Bias Correction on top of the already equalized model from the previous step. Bias correction under the hood uses a reference FP32 model and a QuantizationSimModel to perform its procedure. More details are explained in the AIMET User Guide documentation.

For the correct_bias API, we pass the following parameters

- **num_quant_samples**: Number of samples used for computing encodings. We are setting this to a low number here to speed up execution. A typical number would be 500-1000.
- **num_bias_correct_samples**: Number of samples used for bias correction. We are setting this to a low number here to speed up execution. A typical number would be 1000-2000.
- **data_loader**: BC uses unlabeled data samples from this data loader.

In [29]:
from aimet_torch.quantsim import QuantParams
from aimet_torch.bias_correction import correct_bias

data_loader = ImageNetDataPipeline.get_val_dataloader()

bc_params = QuantParams(weight_bw=8, act_bw=8, round_mode="nearest",
                        quant_scheme=QuantScheme.post_training_tf_enhanced)

correct_bias(model, bc_params, num_quant_samples=16,
             data_loader=data_loader, num_bias_correct_samples=16)

2024-09-09 08:11:06,551 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:11:10,917 - Quant - INFO - No config file provided, defaulting to config file at /usr/local/lib/python3.10/dist-packages/aimet_common/quantsim_config/default_config.json
2024-09-09 08:11:10,960 - Quant - INFO - Unsupported op type Squeeze
2024-09-09 08:11:10,962 - Quant - INFO - Unsupported op type Mean
2024-09-09 08:11:10,966 - Quant - INFO - Selecting DefaultOpInstanceConfigGenerator to compute the specialized config. hw_version:default
2024-09-09 08:11:12,289 - Quant - INFO - Correcting layer features.0.0 using Empirical Bias Correction
2024-09-09 08:11:13,449 - Quant - INFO - Corrected bias for the layer
2024-09-09 08:11:13,452 - Quant - INFO - Correcting layer features.1.conv.0.0 using Empirical Bias Correction
2024-09-09 08:11:14,646 - Quant - INFO - Corrected bias for the layer
2024-09-09 08:11:14,650 - Quant - INFO - Correcting layer features.1.conv.1 using Empirical Bia

---
Now, we can determine the simulated quantized accuracy of the bias-corrected model. We again create a simulation model like before and evaluate to determine simulated quantized accuracy.

In [30]:
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf_enhanced,
                           dummy_input=dummy_input,
                           default_output_bw=8,
                           default_param_bw=8)

sim.compute_encodings(forward_pass_callback=pass_calibration_data,
                      forward_pass_callback_args=use_cuda)

accuracy = ImageNetDataPipeline.evaluate(sim.model, use_cuda)
print(accuracy)

2024-09-09 08:12:18,447 - Quant - INFO - No config file provided, defaulting to config file at /usr/local/lib/python3.10/dist-packages/aimet_common/quantsim_config/default_config.json
2024-09-09 08:12:18,500 - Quant - INFO - Unsupported op type Squeeze
2024-09-09 08:12:18,501 - Quant - INFO - Unsupported op type Mean
2024-09-09 08:12:18,509 - Quant - INFO - Selecting DefaultOpInstanceConfigGenerator to compute the specialized config. hw_version:default
2024-09-09 08:12:20,191 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:12:41,083 - Dataloader - INFO - Dataset consists of 10000 images in 1000 classes
2024-09-09 08:12:41,086 - Eval - INFO - No value of iteration is provided, running evaluation on complete dataset.
2024-09-09 08:12:41,086 - Eval - INFO - Evaluating nn.Module for 313 iterations with batch_size 32


  0% (0 of 313) |                        | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (2 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:37
  0% (3 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:50
  1% (4 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:54
  1% (5 of 313) |                        | Elapsed Time: 0:00:00 ETA:   0:00:57
  1% (6 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:00:58
  2% (7 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:00:57
  2% (8 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:00:58
  2% (9 of 313) |                        | Elapsed Time: 0:00:01 ETA:   0:00:56
  3% (10 of 313) |                       | Elapsed Time: 0:00:01 ETA:   0:00:57
  3% (11 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:00:56
  3% (12 of 313) |                       | Elapsed Time: 0:00:02 ETA:   0:00:57
  4% (13 of 313) |                      

2024-09-09 08:14:09,939 - Eval - INFO - Avg accuracy Top 1: 70.956470 Avg accuracy Top 5: 89.586661 on validation Dataset
70.9564696485623


---
Depending on your settings you may have observed a slight gain in accuracy after applying CLE ad BC. Ofcourse, this was just an example. Please try this against the model of your choice and play with the number of samples to get the best results.

Now the next step would be to take this model to target. For this purpose, we need to export the model with the updated weights without the fake quant ops. And also to export the encodings (scale/offset quantization parameters). AIMET QuantizationSimModel provides an export API for this purpose.

In [31]:
os.makedirs('./output/', exist_ok=True)
dummy_input = dummy_input.cpu()
sim.export(path='./output/', filename_prefix='resnet18_after_cle_bc', dummy_input=dummy_input)





2024-09-09 08:14:12,182 - Utils - INFO - successfully created onnx model with 99/100 node names updated
2024-09-09 08:14:12,258 - Quant - INFO - Layers excluded from quantization: []


---
## Summary

Hope this notebook was useful for you to understand how to use AIMET for performing Cross Layer Equalization (CLE) and Bias Correction (BC).

Few additional resources
- Refer to the AIMET API docs to know more details of the APIs and optional parameters
- Refer to the other example notebooks to understand how to use AIMET post-training quantization techniques and QAT techniques