# Quant Analyzer

This notebook showcases a working code example of how to use AIMET to apply Quant Analyzer.
Quant Analyzer is a feature which performs various analyses on a model to understand how each op in the model responds to quantization.


#### Overall flow
This notebook covers the following
1. Instantiate the example evaluation pipeline
2. Load the FP32 model
3. Apply QuantAnalyzer to the model


#### What this notebook is not
* This notebook is not designed to show state-of-the-art results.
* For example, it uses a relatively quantization-friendly model like Resnet50.
* Also, some optimization parameters are deliberately chosen to have the notebook execute more quickly.

---
## Dataset

This notebook relies on the ImageNet dataset for the task of image classification. If you already have a version of the dataset readily available, please use that. Else, please download the dataset from appropriate location (e.g. https://image-net.org/challenges/LSVRC/2012/index.php#) and convert them into tfrecords.

**Note1**: The ImageNet tfrecords dataset typically has the following characteristics and the dataloader provided in this example notebook rely on these
- A folder containing tfrecords files starting with **'train\*'** for training files and **'valid\*'** for validation files. Each tfrecord file should have features: **'image/encoded'** for image data and **'image/class/label'** for its corresponding class.

**Note2**: To speed up the execution of this notebook, you may use a reduced subset of the ImageNet dataset. E.g. the entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. But for the purpose of running this notebook, you could perhaps reduce the dataset to say 2 samples per class and then convert it into tfrecords. This exercise is left upto the reader and is not necessary.


Edit the cell below and specify the directory where the downloaded ImageNet dataset is saved.

In [None]:
TFRECORDS_DIR = '/path/to/dataset/'         # Please replace this with a real directory

We disable logs at the INFO level and disable eager execution. We set verbosity to the level as displayed (ERROR),
so TensorFlow will display all messages that have the label ERROR (or more critical).

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
tf.logging.set_verbosity(tf.logging.ERROR)

---

## 1. Example evaluation and training pipeline

The following is an example training and validation loop for this image classification task.

- **Does AIMET have any limitations on how the training, validation pipeline is written?**

    Not really. We will see later that AIMET will modify the user's model to create a QuantizationSim model which is still a TensorFlow model.
    This QuantizationSim model can be used in place of the original model when doing inference or training.

- **Does AIMET put any limitation on the interface of the evaluate() or train() methods?**

    Not really. You should be able to use your existing evaluate and train routines as-is.

In [None]:
from typing import List

from Examples.common import image_net_config
from Examples.tensorflow.utils.image_net_evaluator import ImageNetDataLoader
from Examples.tensorflow.utils.image_net_evaluator import ImageNetEvaluator

class ImageNetDataPipeline:
    """
    Provides APIs for model evaluation and finetuning using ImageNet Dataset.
    """
    
    @staticmethod
    def get_val_dataloader():
        """
        Instantiates a validation dataloader for ImageNet dataset and returns it
        """
        data_loader = ImageNetDataLoader(TFRECORDS_DIR,
                                         image_size=image_net_config.dataset['image_size'],
                                         batch_size=image_net_config.evaluation['batch_size'],
                                         format_bgr=True)

        return data_loader
    
    @staticmethod
    def evaluate(sess: tf.Session, iterations: int = None) -> float:
        """
        Given a TF session, evaluates its Top-1 accuracy on the validation dataset
        :param sess: The sess graph to be evaluated.
        :param iterations: No of batches to use. Default is complete dataset
        :return: The accuracy for the sample with the maximum accuracy.
        """
        evaluator = ImageNetEvaluator(TFRECORDS_DIR, training_inputs=['keras_learning_phase:0'],
                                      data_inputs=['input_1:0'], validation_inputs=['labels:0'],
                                      image_size=image_net_config.dataset['image_size'],
                                      batch_size=image_net_config.evaluation['batch_size'],
                                      format_bgr=True)

        return evaluator.evaluate(sess, iterations)


---

## 2. Load the model

For this example notebook, we are going to load a pretrained ResNet50 model from keras and covert it to a tensorflow session. Similarly, you can load any pretrained tensorflow model instead.


Calling clear_session() releases the global state: this helps avoid clutter from old models and layers, especially when memory is limited.


By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op.

In [None]:
from tensorflow.compat.v1.keras.applications.resnet import ResNet50

tf.keras.backend.clear_session()

model = ResNet50(weights='imagenet', input_shape=(224, 224, 3))

The following utility method in AIMET sets BN layers in the model to eval mode. This allows AIMET to more easily read the BN parameters from the graph. Eventually we will fold BN layers into adjacent conv layers.

In [None]:
from aimet_tensorflow.utils.graph import update_keras_bn_ops_trainable_flag

model = update_keras_bn_ops_trainable_flag(model, load_save_path="./", trainable=False)

AIMET features currently support tensorflow sessions. **add_image_net_computational_nodes_in_graph** adds an output layer, softmax and loss functions to the Resnet50 model graph.

In [None]:
from Examples.tensorflow.utils.add_computational_nodes_in_graph import add_image_net_computational_nodes_in_graph

sess = tf.keras.backend.get_session()
add_image_net_computational_nodes_in_graph(sess, model.output.name, image_net_config.dataset['images_classes'])

Since all tensorflow input and output tensors have names, we identify the tensors needed by AIMET APIs here. 

In [None]:
starting_op_names = [model.input.name.split(":")[0]]
output_op_names = [model.output.name.split(":")[0]]

We are checking if TensorFlow is using CPU or CUDA device. This example code will use CUDA if available in your current execution environment.

In [None]:
use_cuda = tf.test.is_gpu_available(cuda_only=True)

---

## 3. Apply QuantAnalyzer to the model

QuantAnalyzer requires two functions to be defined by the user for passing data through the model:

**Forward pass callback**

One function will be used to pass representative data through a quantized version of the model to calibrate quantization parameters.
This function should be fairly simple - use the existing train or validation data loader to extract some samples and pass them to the model.
We don't need to compute any loss metrics, so we can just ignore the model output.

The function **must** take two arguments, the first of which will be the session to run the forward pass on.
The second argument can be anything additional which the function requires to run, and can be in the form of a single item or a tuple of items.

If no additional argument is needed, the user can specify a dummy "_" parameter for the function.

A few pointers regarding the forward pass data samples:

- In practice, we need a very small percentage of the overall data samples for computing encodings.
  For example, the training dataset for ImageNet has 1M samples. For computing encodings we only need 500 to 1000 samples.
- It may be beneficial if the samples used for computing encoding are well distributed.
  It's not necessary that all classes need to be covered since we are only looking at the range of values at every op activation.
  However, we definitely want to avoid an extreme scenario like all 'dark' or 'light' samples are used - e.g. only using pictures captured at night might not give ideal results.

The following shows an example of a routine that passes unlabeled samples through the model for computing encodings.
This routine can be written in many different ways; this is just an example.
This function only requires unlabeled data as no loss or other evaluation metric is needed.

In [None]:
def pass_calibration_data(session: tf.compat.v1.Session, _):
    data_loader = ImageNetDataPipeline.get_val_dataloader()
    batch_size = data_loader.batch_size

    input_label_tensors = [session.graph.get_tensor_by_name('input_1:0'),
                           session.graph.get_tensor_by_name('labels:0')]
    
    train_tensors = [session.graph.get_tensor_by_name('keras_learning_phase:0')]
    train_tensors_dict = dict.fromkeys(train_tensors, False)
    
    eval_outputs = [session.graph.get_operation_by_name('top1-acc').outputs[0]]

    samples = 500

    batch_cntr = 0
    for input_label in data_loader:
        input_label_tensors_dict = dict(zip(input_label_tensors, input_label))

        feed_dict = {**input_label_tensors_dict, **train_tensors_dict}

        with session.graph.as_default():
            _ = session.run(eval_outputs, feed_dict=feed_dict)

        batch_cntr += 1
        if (batch_cntr * batch_size) > samples:
            break

In order to pass this function to QuantAnalyzer, we need to wrap it in a CallbackFunc object, as shown below.
The CallbackFunc takes two arguments: the callback function itself, and the inputs to pass into the callback function.

In [None]:
from aimet_common.utils import CallbackFunc

forward_pass_callback = CallbackFunc(func=pass_calibration_data, func_callback_args=None)


---

**Evaluation callback**

The second function will be used to evaluate the model, and needs to return an accuracy metric.
In here, the user should pass any amount of data through the model which they would like when evaluating their model for accuracy.

Like the forward pass callback, this function also must take exactly two arguments: the session to evaluate, and any additional argument needed for the function to work.
The second argument can be a tuple of items in case multiple items are needed.

We will be using the ImageNetDataPipeline's evaluate defined above for this purpose.
Like the forward pass callback, we need to wrap the evaluation callback in a CallbackFunc object as well.

In [None]:
data_pipeline = ImageNetDataPipeline()
eval_callback = CallbackFunc(func=ImageNetDataPipeline.evaluate)

---

**Creating unlabeled dataset and defining number of batches for MSE loss per op analysis**

An optional analysis step in QuantAnalyzer calculates the MSE loss per op in the model, comparing the op outputs from the original FP32 model vs. a quantized model.
To perform this step, the user needs to also provide an unlabeled Dataset to QuantAnalyzer.

We will demonstrate this step by using the ImageNetDataLoader imported above.

In [None]:
dataset = data_pipeline.get_val_dataloader().dataset
    
with dataset._graph.as_default():
    unlabeled_dataset = dataset.map(lambda x,y: x)
num_batches = 4 

---
We are now ready to apply QuantAnalyzer.

In [None]:
from aimet_tensorflow.quant_analyzer import QuantAnalyzer

quant_analyzer = QuantAnalyzer(sess, start_op_names=starting_op_names, output_op_names=output_op_names,
                               forward_pass_callback=forward_pass_callback, eval_callback=eval_callback, use_cuda= use_cuda)


Finally, to start the analyzer, we call analyze()

A few of the parameters are explained here:
- **quant_scheme**:
    - We set this to "post_training_tf_enhanced"
      With this choice of quant scheme, AIMET will use the TF Enhanced quant scheme to initialize the quantization parameters like scale/offset.
- **default_output_bw**: Setting this to 8 means that we are asking AIMET to perform all activation quantizations in the model using integer 8-bit precision.
- **default_param_bw**: Setting this to 8 means that we are asking AIMET to perform all parameter quantizations in the model using integer 8-bit precision.

There are other parameters that are set to default values in this example.
Please check the AIMET API documentation of QuantizationSimModel to see reference documentation for all the parameters.

When analyze method is called, the following analyses are run:
- Compare fp32 accuracy, accuracy with only parameters quantized and accuracy with only activations quantized
- For each op, track the model accuracy when quantization for all other ops is disabled (enabling quantization for only one op in the model at a time)
- For each op, track the model accuracy when quantization for all other ops is enabled (disabling quantization for only one op in the model at a time)
- Track the minimum and maximum encoding parameters calculated by each quantizer in the model as a result of forward passes through the model with representative data
- When the TF Enhanced quantization scheme is used, track the histogram of tensor ranges seen by each quantizer in the model as a result of forward passes through the model with representative data
- Track the MSE loss seen at each op by comparing op outputs of the original fp32 model vs. a quantized model when user has provided unlabeled dataset and number of batches

In [None]:
from aimet_common.defs import QuantScheme

quant_analyzer.analyze(default_param_bw=8, default_output_bw=8,
                       quant_scheme=QuantScheme.post_training_tf_enhanced,
                       config_file=None,
                       unlabeled_dataset=unlabeled_dataset, num_batches=num_batches,
                       results_dir='./tmp/')

AIMET will also output .html plots and json files where appropriate for each analysis to help visualize the data.

The following output files will be produced, in a folder specified by the user:

```
results_dir
|-- per_op_quant_enabled.html
|-- per_op_quant_enabled.json
|-- per_op_quant_disabled.html
|-- per_op_quant_disabled.json
|-- min_max_ranges
|   |-- activations.html
|   |-- activations.json
|   |-- weights.html
|   +-- weights.json
|-- activations_pdf
|   |-- quant_op_name0.html
|   |-- quant_op_name1.html
|   |-- ...
|   +-- quant_op_nameN.html
|-- weights_pdf
|   |-- op1
|   |   |-- param_name_{channel_index_0}.html
|   |   |-- param_name_{channel_index_1}.html
|   |   |-- ...
|   |   +-- param_name_{channel_index_x}.html
|   |-- op2
|   |   |-- param_name_{channel_index_0}.html
|   |   |-- param_name_{channel_index_1}.html
|   |   |-- ...
|   |   +-- param_name_{channel_index_y}.html
|   |-- ...
|   |-- opn
|   |   |-- param_name_{channel_index_0}.html
|   |   |-- param_name_{channel_index_1}.html
|   |   |-- ...
|   +-- +-- param_name_{channel_index_z}.html
|-- per_op_mse_loss.html
+-- per_op_mse_loss.json
```

#### Per op analysis by enabling/disabling quantization ops

- per_op_quant_enabled.html: A plot with ops on the x-axis and model accuracy on the y-axis, where each op's accuracy represents the model accuracy when all quantizers in the model are disabled except for that op's parameter and activation quantizers.
- per_op_quant_enabled.json: A json file containing the data shown in per_op_quant_enabled.html, associating op names with model accuracy.
- per_op_quant_disabled.html: A plot with ops on the x-axis and model accuracy on the y-axis, where each op's accuracy represents the model accuracy when all quantizers in the model are enabled except for that op's parameter and activation quantizers.
- per_op_quant_disabled.json: A json file containing the data shown in per_op_quant_disabled.html, associating op names with model accuracy.

![per_op_quant_enabled.html](images/tf_quant_analyzer_per_op_quant_enabled.png)

#### Encoding min/max ranges

- min_max_ranges: A folder containing the following sets of files:
    - activations.html: A plot with output activations on the x-axis and min-max values on the y-axis, where each output activation's range represents the encoding min and max parameters computed during forward pass calibration.
    - activations.json: A json file containing the data shown in activations.html, associating op names with min and max encoding values.
    - weights.html: A plot with parameter names on the x-axis and min-max values on the y-axis, where each parameter's range represents the encoding min and max parameters computed during forward pass calibration.
    - weights.json: A json file containing the data shown in weights.html, associating parameter names with min and max encoding values.

![min_max_ranges.html](images/tf_quant_analyzer_min_max_range_weights.png)

#### PDF of statistics

- (If TF Enhanced quant scheme is used) activations_pdf: A folder containing html files for each op, plotting the histogram of tensor values seen for that op's output activation seen during forward pass calibration.
- (If TF Enhanced quant scheme is used) weights_pdf: A folder containing sub folders for each op with weights.
  Each op's folder contains html files for each parameter of that op, with a histogram plot of tensor values seen for that parameter seen during forward pass calibration.

![weights_pdf.html](images/tf_quant_analyzer_pdf.png)

#### Per op MSE loss
- (Optional, only enabled when user has provided unlabeled dataset and number of batches) per_op_mse_loss.html: A plot with ops on the x-axis and MSE loss on the y-axis, where each op's MSE loss represents the MSE seen comparing that op's outputs in the FP32 model vs. the quantized model.
- (Optional, only enabled when user has provided unlabeled dataset and number of batches) per_op_mse_loss.json: A json file containing the data shown in per_op_mse_loss.html, associating op names with MSE loss.

![per_op_mse_loss.html](images/tf_quant_analyzer_mse_loss.png)