# Activation Z-Score Threshold Demonstration For Post-Training Quantization




[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb)

## Overview

This tutorial demonstrates the process used to find the activation z-score threshold, a step that MCT can use during post-training quantization.

In this example we will explore how setting different z scores effects threshold and accuracy. We will start by demonstrating how to apply the corresponding MCT configurations, then, we will feed a representative dataset through the model, plot the activation distribution of an activation layer with their respective MCT calculated z-score thresholds, and finally compare the quantized model accuracy of the examples of different z-score.


## Activation threshold explanation

During quantization process, thresholds are used to map a distribution of 32-bit float values to their quantized counterparts. Doing this with the least loss of data while maintaining the most representative range is important for final model accuracy.

Some models exhibit anomolus values when fed a representative dataset. It is in the interest of the models accuracy to remove these values so that the quantization threshold results in a more reliable range mapping.

MCT has the option to remove these using z-score thresholding. Allowing the user to remove data based on standard distributions.

The Z-score of a value is calculated by subtracting the mean of the dataset from the value and then dividing by the standard deviation of the dataset. This measures how many standard deviations an element is from the mean.



To calculate a threshold $t$ for quantization based on a Z-score threshold $Z_t$, you might define $t$ as a function of $Z_t$, $\mu$, and $\sigma$, such as:

$$
t(Z_t) = μ + Z_t \cdot σ
$$


Where:

- $t(Z_t)$: The quantization threshold calculated based on a Z-score threshold $Z_t$.
- $Z_t$: The chosen Z-score threshold value, which determines how many standard deviations from the mean an activation needs to be to be considered for special handling (e.g., removal or adjustment) before the main quantization process.
- $\mu = \frac{1}{n_s} \sum_{X \in Fl(D)} X$: The mean of activations
- $\sigma = \sqrt{\frac{1}{n_s} \sum_{X \in Fl(D)} (X - \mu)^2}$: The standard deviation of activations in $Fl(D)$.
where:
- $Fl(D)$ is the activation distribution and $X$ is an individual activation.


This equation for $t(Z_t)$ allows you to set a threshold based on the statistical distribution of activations, identifying values that are unusually high or low relative to the rest of the data. These identified values can then be removed before applying the main quantization algorithm.

## Setup

Install and import the relevant packages:


In [None]:
TF_VER = '2.14.0'

!pip install -q tensorflow=={TF_VER}
!pip install -q mct-nightly

In [None]:
import tensorflow as tf
import keras
import model_compression_toolkit as mct
import os

Clone MCT to gain access to tutorial scripts

In [None]:
!git clone https://github.com/sony/model_optimization.git local_mct
!pip install -r ./local_mct/requirements.txt
import sys
sys.path.insert(0,"./local_mct")
import tutorials.resources.utils.keras_tutorial_tools as tutorial_tools


## Dataset

Load ImageNet classification dataset and seperate a small representative subsection of this dataset to use for quantization.

In [None]:
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !mv ILSVRC2012_devkit_t12.tar.gz imagenet/
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    !mv ILSVRC2012_img_val.tar imagenet/

In [None]:
import torchvision
if not os.path.isdir('imagenet/val'):
    ds = torchvision.datasets.ImageNet(root='./imagenet', split='val')

Here we create the representative dataset. For detail on this step see [ImageNet tutorial](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb). If you are running locally a higher fraction of the dataset can be used.

In [None]:
REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'
BATCH_SIZE = 20
fraction =0.001
model_version = 'MobileNet'

preprocessor = tutorial_tools.DatasetPreprocessor(model_version=model_version)
representative_dataset_gen = preprocessor.get_representative_dataset(fraction, REPRESENTATIVE_DATASET_FOLDER, BATCH_SIZE)

## MCT Quantization

This step we load the model and quantize with a few z-score thresholds.


First we load MobileNet from the keras library.

In [None]:
from tensorflow.keras.applications import MobileNet
float_model = MobileNet(weights='imagenet')

Quantization perameters are defined. Here we will use default values apart from quantization method.

In [None]:
from model_compression_toolkit.core import QuantizationErrorMethod

# Specify the IMX500-v1 target platform capability (TPC)
tpc = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version='v1')

# List of error methods to iterate over
q_configs_dict = {}

You can edit the code below to quantize with other values of z-score.

In [None]:
# Z-score values to iterate over
z_score_values = [3,5,9]

# Iterate and build the QuantizationConfig objects
for z_score in z_score_values:
    q_config = mct.core.QuantizationConfig(
        z_threshold=z_score,
    )
    q_configs_dict[z_score] = q_config



Finally we quantize the model, this can take some time. Grab a coffee!

In [None]:
quantized_models_dict = {}

for z_score, q_config in q_configs_dict.items():
    # Create a CoreConfig object with the current quantization configuration
    ptq_config = mct.core.CoreConfig(quantization_config=q_config)

    # Perform MCT post-training quantization
    quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(
        in_model=float_model,
        representative_data_gen=representative_dataset_gen,
        core_config=ptq_config,
        target_platform_capabilities=tpc
    )

    # Update the dictionary to include the quantized model
    quantized_models_dict[z_score] = {
        "quantization_config": q_config,
        "quantized_model": quantized_model,
        "quantization_info": quantization_info
    }


### Z-Score Threshold and Distribution Visulisation

To assist with understanding we will now plot the activation distribution of Mobilenet's first activation layer.

This will be obtained by feeding the representative dataset through the model.
To see the distribution of the activations the model needs to be rebuilt upto and including the layer chosen for distribution visulisation.

To see said layers z-score threshold values. we will need to calculate these manually using the equestion stated in the introduction.

To plot the distribution we first need to list the layer names. With keras this can be done easily using the following. We established the index of the layer of interest using various checks that can be seen in the appendix section.

In [None]:
#print layer name
print(float_model.layers[51].name)

The example activation layer in model is 'conv_dw_8_relu'.

Use this layer name to create a model ending at conv_dw_8_relu

In [None]:
from tensorflow.keras.models import Model
layer_name1 = 'conv_dw_8_relu'

layer_output1 = float_model.get_layer(layer_name1).output
activation_model_relu = Model(inputs=float_model.input, outputs=layer_output1)

Feed the representative dataset through these models and store the output.

In [None]:
import numpy as np
activation_batches_relu = []
activation_batches_project = []
for images in representative_dataset_gen():
    activations_relu = activation_model_relu.predict(images)
    activation_batches_relu.append(activations_relu)

all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()

We can calculate the z-score for a layer using the equations stated in the introduction.

In [None]:
optimal_thresholds_relu = {}

# Calculate the mean and standard deviation of the activation data
mean = np.mean(all_activations_relu)
std_dev = np.std(all_activations_relu)

# Calculate and store the threshold for each Z-score
for zscore in z_score_values:
    optimal_threshold = zscore * std_dev + mean
    optimal_thresholds_relu[f'z-score {zscore}'] = optimal_threshold

### Distribution Plots

Here we plot the distribution from the resulting model along with its z score thresholds.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Plotting
plt.figure(figsize=(10, 6))
plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Activations')
for z_score, threshold in optimal_thresholds_relu.items():
    random_color=np.random.rand(3,)
    plt.axvline(threshold, linestyle='--', linewidth=2, color=random_color, label=f'{z_score}, z-score threshold: {threshold:.2f}')
    z_score_1 = int(z_score.split(' ')[1])  # Splits the string and converts the second element to an integer
    error_value = mse_error_thresholds[z_score_1]  # Now using the correct integer key to access the value
    plt.axvline(error_value, linestyle='-', linewidth=2, color=random_color, label=f'{z_score}, MSE error Threshold: {error_value:.2f}')

plt.title('Activation Distribution with Optimal Quantization Thresholds - First ReLU Layer')
plt.xlabel('Activation Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Here it can plainly be seen the effect of z-score on error threshold. The lowest z-score of 3 reduces the error threshold for that layer.

## Accuracy

Finally we can show the effect of these different z-score thresholds on the models accuracy.

In [None]:
REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'
BATCH_SIZE = 20
fraction =0.005
model_version = 'MobileNet'

preprocessor = tutorial_tools.DatasetPreprocessor(model_version=model_version)
evaluation_dataset = preprocessor.get_validation_dataset_fraction(fraction, REPRESENTATIVE_DATASET_FOLDER, BATCH_SIZE)

In [None]:
#prepare float model and evaluate
float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
results = float_model.evaluate(evaluation_dataset)

In [None]:
#prepare quantised models and evaluate
evaluation_results = {}

for z_score, data in quantized_models_dict.items():
    quantized_model = data["quantized_model"]

    quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])

    results = quantized_model.evaluate(evaluation_dataset, verbose=0)  # Set verbose=0 to suppress the log messages

    evaluation_results[z_score] = results

    # Print the results
    print(f"Results for {z_score}: Loss = {results[0]}, Accuracy = {results[1]}")

Here we can see very minor gains from adjusting the z-score threshold. For the majority of simple models this trend will likely follow. From testing we have found that transformer models have a tendancy to benefit from anomoly removal but it is always worth playing with these perameters if your quantised accuracy is distinctly lower than your float model accuracy.



## Conclusion

In this tutorial, we demonstrated the z-score thresholding step used during quantization. Please use this code to assist with choosing z-score thresholds for your own model.

We have found a when adjusting z-score the sweet spot tends to be between 8 and 12. with no change above 12 and distribution distruction below 8. This will likely require a study on your part for your specific usecase.




## Appendix

Below are a sellection of code samples used to establish the best layers to use for plotting thresholds and distributions.

Firstly of the list of layers that are effected by this z-score adjustment

In [None]:
# Initialize a dictionary to hold threshold values for comparison
thresholds_by_index = {}

# Try to access each layer for each quantized model and collect threshold values
for z_score, data in quantized_models_dict.items():
    quantized_model = data["quantized_model"]
    for layer_index in range(len(quantized_model.layers)):
        try:
            # Attempt to access the threshold value for this layer
            threshold = quantized_model.layers[layer_index].activation_holder_quantizer.get_config()['threshold'][0]
            # Store the threshold value for comparison
            if layer_index not in thresholds_by_index:
                thresholds_by_index[layer_index] = set()
            thresholds_by_index[layer_index].add(threshold)
        except Exception as e:
            pass

# Find indices where threshold values are not consistent
inconsistent_indices = [index for index, thresholds in thresholds_by_index.items() if len(thresholds) > 1]

print("Inconsistent indices:", inconsistent_indices)


Choosing randomly from these we check the thresholds

In [None]:
mse_error_thresholds = {
    z_score: data["quantized_model"].layers[52].activation_holder_quantizer.get_config()['threshold'][0]
    for z_score, data in quantized_models_dict.items()
}
print(mse_error_thresholds)

We now want to varify which layers matchup indicies based on layer names of the float model. For the example of 52 there is no matching layer as it is a quantization of the previous layer. Checking 51 we can see that the indicies matches upto the layer name conv_dw_8_relu, we can use this to plot the distribution.

In [None]:
target_z_score = 9

for index, layer in enumerate(float_model.layers):
    search_string = str(layer.name)

    # Check if the target_z_score is in the quantized_models_dict
    if target_z_score in quantized_models_dict:
        data = quantized_models_dict[target_z_score]
        # Iterate over each layer of the target quantized model
        for quantized_index, quantized_layer in enumerate(data["quantized_model"].layers):
            found = search_string in str(quantized_layer.get_config())
            # If found, print details including the indices of the matching layers
            if found:
                print(f"Float Model Layer Index {index} & Quantized Model Layer Index {quantized_index}: Found match in layer name  {search_string}")
    else:
        print(f"Z-Score {target_z_score} not found in quantized_models_dict.")


In [None]:
data["quantized_model"].layers[51].get_config()



Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
