# Activation Threshold Demonstration For Post-Training Quantization




[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision.ipynb)

## Overview

This tutorial demonstrates the process used to find the activation threshold, a step that MCT uses during post-training quantisation.

In this example, for a single activation layer. We will run 2 methods of mct quantisation, feed a representative dataset through the model, plot the activation distribution of two layers with their respective mct calculated thresholds and finally compare the quantised model accuracy of the two methods.



## Activation threshold explanation


Quantisation thresholds are used to map a distribution of 32bit float values to their 8bit quantised counterparts. Doing this with the least loss of data while maintaining the most representative range is important for final model accuracy.


MCT's Post-training quantisation uses a represenative dataset to evaluate a list of typical output activation values. The challenge comes with how best to match these values to their quantised counterparts. This process comprises of two main steps, zscore threshold, quantisation threshold. Initially anomolus values must be removed these values may result in the final threshold being greater than it needs to be (reducing granularity of mapping and there for also reducing accuracy of model).
Here MCT uses z-score thresholding on the activation values establishing a threshold value to remove anomolus values. The process by which this is used will be covered in another tutorial.

Quantisation Threshold MCT has a number of error metrics for finding the best quantisation threshold. However, Mean squared error is typically the best performing and used by default.

The error is calculated based on the difference between the float and quantised distribution. The threshold is sellected based on the minimum error. For the case of MSE;

$$
ERR(t) = \frac{1}{n_s} \sum_{X \in Fl(D)} (Q(X, t, n_b) - X)^2
$$

$ERR(t)$ : The quantization error function dependent on threshold t.
ns: The size of the representative dataset, indicating normalization over the dataset's size.

$\sum$: Summation over all elements X in the flattened dataset $Fl(D)$.

$F_l(D)$: The collection of activation tensors in the l-th layer, representing the dataset D flattened for processing.

$Q(X, t, n_b)$: The quantized approximation of X, given a threshold t and bit width nb.

$X$: The original activation tensor before quantization.

$t$: The quantization threshold, a critical parameter for controlling the quantization process.

$n_b$: The number of bits used in the quantization process, affecting the model's precision and size.

To increase efficiency in calculating the threshold, the search space for best threshold is restricted to **Power of Two** values only. This both restricts number of potential values to a reasonable number and increases hardware efficency.



Error methods supported by MCT;

NOCLIPPING - Use min/max values as thresholds.

MSE - Use min square error for minimizing quantizationnoises.

MAE - Use min absolute error for minimizing quantization nose.

KL - Use KL-divergen ce tosgnals disb as tas o be similar as posible.

Lp - Use Lpsingimizing quantization noise.

## Setup

Install and import the relevant packages:


In [None]:
!pip install -q tensorflow
!pip install -q mct-nightly

In [None]:
import tensorflow as tf
import keras
import model_compression_toolkit as mct
import os

Clone MCT to gain access to tutorial scripts

In [None]:
!git clone https://github.com/sony/model_optimization.git local_mct
!pip install -r ./local_mct/requirements.txt
import sys
sys.path.insert(0,"./local_mct")
import tutorials.resources.utils.keras_tutorial_tools as tutorial_tools

## Dataset

Load imagenet classification dataset and seperate a small representative subsection of this dataset to use for quantisation.

In [None]:
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !mv ILSVRC2012_devkit_t12.tar.gz imagenet/
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    !mv ILSVRC2012_img_val.tar imagenet/

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import torchvision
if not os.path.isdir('imagenet/val'):
    ds = torchvision.datasets.ImageNet(root='./imagenet', split='val')

Here we create the representative dataset. For detail on this step see imagenet tutorial. If you are running locally a higher fraction of the dataset can be used.

In [None]:
REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'
BATCH_SIZE = 20
fraction =0.001
representative_dataset_gen = tutorial_tools.get_representative_dataset(fraction, REPRESENTATIVE_DATASET_FOLDER, BATCH_SIZE)

## MCT quantisation

This step we load the model and quantise with two methods of threshold error calculation: no clipping and MSE.

No clipping chooses the lowest Power of two threshold that does not loose any data to its threshold.

MSE chooses a Power of two threshold that results in the least difference between the float distribution and the quantised distribution.

This means no clipping will often result in a larger threshold, which we will see later in this tutorial.

First we load mobilenetv2 from the keras library

In [None]:
from keras.applications.mobilenet_v2 import MobileNetV2
float_model = MobileNetV2()

Quantisation perameters are defined.

In [None]:
from model_compression_toolkit import QuantizationErrorMethod

# Specify the IMX500-v1 target platform capability (TPC)
tpc = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version='v1')

# Set the following quantization configurations:
# Choose the desired QuantizationErrorMethod for the quantization parameters search.
# Enable weights bias correction induced by quantization.
# Enable shift negative corrections for improving 'signed' non-linear functions quantization (such as swish, prelu, etc.)
# Set the threshold to filter outliers with z_score of 16.

# List of error methods to iterate over
q_configs_dict = {}

# Common parameters
weights_bias_correction = True
shift_negative_activation_correction = True
z_threshold = 16

You can edit the code below to quantise with other error metrics MCT supports.

In [None]:
# Error methods to iterate over
error_methods = [
    QuantizationErrorMethod.MSE,
    QuantizationErrorMethod.NOCLIPPING
]

# If you are curious you can add any of the below quantisation methods as well.
#QuantizationErrorMethod.MAE
#QuantizationErrorMethod.KL
#QuantizationErrorMethod.LP

# Iterate and build the QuantizationConfig objects
for error_method in error_methods:
    q_config = mct.QuantizationConfig(
        activation_error_method=error_method,
        weights_error_method=error_method,
        weights_bias_correction=weights_bias_correction,
        shift_negative_activation_correction=shift_negative_activation_correction,
        z_threshold=z_threshold
    )

    q_configs_dict[error_method] = q_config

Finally we quantise the model, this can take some time.

In [None]:
quantized_models_dict = {}

for error_method, q_config in q_configs_dict.items():
    # Create a CoreConfig object with the current quantization configuration
    ptq_config = mct.core.CoreConfig(quantization_config=q_config)

    # Perform MCT post-training quantization
    quantized_model, quantization_info = mct.ptq.keras_post_training_quantization_experimental(
        in_model=float_model,
        representative_data_gen=representative_dataset_gen,
        core_config=ptq_config,
        target_platform_capabilities=tpc
    )

    # Update the dictionary to include the quantized model
    quantized_models_dict[error_method] = {
        "quantization_config": q_config,
        "quantized_model": quantized_model,
        "quantization_info": quantization_info
    }


## Threshold and Distribution Visulisation

To assist with understanding we will now plot the threshold and distributuion from two of Mobilenetv2's layers.

MCT quantisation_info stores threshold data per layer. However, to see the distribution of the activations the model needs to be rebuilt upto and including the layer chosen for distribution visulisation.

To do this we first need to list the layer names. With keras this can be done easily for the first 10 layes with the following.

In [None]:
for index, layer in enumerate(float_model.layers):
    if index < 10:
        print(layer.name)
    else:
        break

First activation layer in model is 'Conv1_relu'.

For this particular model expanded_conv_project_BN demonstrates the difference between the two error metrics so we will also use this.

Use these layer names to create a pair of models that end in these respective layers.

In [None]:
from tensorflow.keras.models import Model
layer_name1 = 'Conv1_relu'
layer_name2 = 'expanded_conv_project_BN'

layer_output1 = float_model.get_layer(layer_name1).output
activation_model_relu = Model(inputs=float_model.input, outputs=layer_output1)
layer_output2 = float_model.get_layer(layer_name2).output
activation_model_project = Model(inputs=float_model.input, outputs=layer_output2)

Feed the representative dataset through these models and store the output.

In [None]:
import numpy as np
activation_batches_relu = []
activation_batches_project = []
for images in representative_dataset_gen():
    activations_relu = activation_model_relu.predict(images)
    activation_batches_relu.append(activations_relu)
    activations_project = activation_model_project.predict(images)
    activation_batches_project.append(activations_project)

all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()
all_activations_project = np.concatenate(activation_batches_project, axis=0).flatten()

Thresholds calculated by MCT during quantisation can be accessed using the following. The layer number matches the index of the layers named in the previous steps.

As mentioned above we use the first activation relu layer and the batch normalisation layer as they best demonstrate the effect of the two threshold error methods.

In [None]:
optimal_thresholds_relu = {error_method: data["quantized_model"].layers[4].activation_holder_quantizer.get_config()['threshold'][0] for error_method, data in quantized_models_dict.items()}
optimal_thresholds_project = {error_method: data["quantized_model"].layers[9].activation_holder_quantizer.get_config()['threshold'][0] for error_method, data in quantized_models_dict.items()}

### Distribution Plots

These are the distributions of the two layers firstly, below relu and secondly Project_BN.

The second distribution shows distinctly the difference in the result of the two error metrics.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
optimal_thresholds_relu = {error_method: data["quantized_model"].layers[4].activation_holder_quantizer.get_config()['threshold'][0] for error_method, data in quantized_models_dict.items()}

# Plotting
plt.figure(figsize=(10, 6))
plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Original')
for method, threshold in optimal_thresholds_relu.items():
    plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')

plt.title('Activation Distribution with Optimal Quantization Thresholds First Relu Layer')
plt.xlabel('Activation Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Plotting
plt.figure(figsize=(10, 6))
plt.hist(all_activations_project, bins=100, alpha=0.5, label='Original')
for method, threshold in optimal_thresholds_project.items():
    plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')

plt.title('Activation Distribution with Optimal Quantization Thresholds Prohject BN layer')
plt.xlabel('Activation Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

## Accuracy

Finally we can show the effect of these different thresholds on the models accuracy.

In [None]:
TEST_DATASET_FOLDER = './imagenet/val'
evaluation_dataset = tutorial_tools.get_validation_dataset_fraction(0.005, TEST_DATASET_FOLDER, BATCH_SIZE)

In [None]:
float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
results = float_model.evaluate(evaluation_dataset)

In [None]:
evaluation_results = {}

for error_method, data in quantized_models_dict.items():
    quantized_model = data["quantized_model"]

    quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])

    results = quantized_model.evaluate(evaluation_dataset, verbose=0)  # Set verbose=0 to suppress the log messages

    evaluation_results[error_method] = results

    # Print the results
    print(f"Results for {error_method}: Loss = {results[0]}, Accuracy = {results[1]}")

These results mirror the case for many models hence why MSE has been chosen by default by the MCT team.

Each of MCT's error methods have a different effect on different models so it is always worth including this metric into hyper perameter tuning when trying to improve quantised model accuracy.

## Conclusion

In this tutorial, we demonstrated the methods used to find a layers quantisation threshold for activation. The process is similar for weight quantisation but a representative dataset is not required. Use this code to assist with choosing error methods for your own model.




## Appendix

Some code to assist with gaining information from each layer in the MCT quanisation output.

In [None]:
import tensorflow as tf
import inspect


quantized_model = data["quantized_model"]
quantizer_object = quantized_model.layers[1]

quantized_model = data["quantized_model"]


relu_layer_indices = []


for i, layer in enumerate(quantized_model.layers):
    # Convert the layer's configuration to a string
    layer_config_str = str(layer.get_config())

    layer_class_str = str(layer.__class__.__name__)

    # Check if "relu" is mentioned in the layer's configuration or class name
    if 'relu' in layer_config_str.lower() or 'relu' in layer_class_str.lower():
        relu_layer_indices.append(i)

print("Layer indices potentially using ReLU:", relu_layer_indices)
print("Number of relu layers " + str(len(relu_layer_indices)))


In [None]:
for error_method, data in quantized_models_dict.items():
    quantized_model = data["quantized_model"]
    print(quantized_model.layers[1])



Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
