# Network Editor Usage

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb)

## Introduction

In this tutorial, we will demonstrate how to leverage the Model Compression Toolkit (MCT) to quantize a simple Keras model and modify the quantization configuration for specific layers using the MCT's network editor. Our example model consists of a Conv2D layer followed by a Dense layer. Initially, we will quantize this model with a default configuration and inspect the bit allocation for each layer's weights. Then, we will introduce an edit rule to specifically quantize the Conv2D layer with a different bit width, showcasing the flexibility of MCT in customizing quantization schemes per layer.

First, we install MCT and import requiered modules:

In [None]:
TF_VER = '2.14.0'

!pip install -q tensorflow=={TF_VER}
! pip install -q mct-nightly

In [2]:
import model_compression_toolkit as mct
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, Dense
from tensorflow.keras.models import Model

Now, we create a simple Keras model with a Conv2D layer and a Dense layer:

In [3]:
input_shape = (16, 16, 3)

inputs = Input(shape=input_shape)
x = Conv2D(filters=1, kernel_size=(3, 3))(inputs)
x = Dense(units=10)(x)
model = Model(inputs=inputs, outputs=x)

In this tutorial, for demonstration purposes and to expedite the process, we create a simple representative dataset generator using random data. This generator produces a batch of random input data matching the model's input shape.

In [4]:
batch_size = 1
def representative_data_gen():
    yield [np.random.randn(batch_size, *input_shape)]


Let's define a function that takes a Keras model, a representative data generator, and a core configuration for quantization. The function utilizes Model Compression Toolkit's post-training quantization API:

In [5]:

def quantize_keras_mct(model, representative_data_gen, core_config):
  quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(
      in_model=model,
      representative_data_gen=representative_data_gen,
      core_config=core_config
  )
  return quantized_model


In this section, we start by setting a default core configuration for quantization using Model Compression Toolkit's CoreConfig. After quantizing the model with this configuration, we examine the number of bits used in the quantization of specific layers. We retrieve and print the number of bits used for the the layers' attribute called 'kernel' in both the Conv2D layer and the Dense layer. By default 8-bit are used for quantization across different types of layers in a model.

In [None]:
# Use default core config for observing baseline quantized model
core_config = mct.core.CoreConfig()

quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)
conv2d_layer = quantized_model.layers[2]
conv2d_nbits = conv2d_layer.weights_quantizers['kernel'].get_config()['num_bits']

dense_layer = quantized_model.layers[4]
dense_nbits = dense_layer.weights_quantizers['kernel'].get_config()['num_bits']

print(f"Conv2D nbits: {conv2d_nbits}, Dense nbits: {dense_nbits}")

## Edit Configration Using Edit Rules List

 Now let's see how to customize the quantization process for specific layers using MCT's network editor. An `EditRule` is created with a `NodeTypeFilter` targeting the Conv2D layer type.

  The action associated with this rule changes the quantization configuration of the 'kernel' attribute to 4 bits instead of the default 8 bits. This rule is then included in a list (`edit_rules_list`) which is passed to the `DebugConfig`.
   
 The `DebugConfig`, with our custom rule, is then used to create a `CoreConfig`. This configuration will be applied when quantizing the model, resulting in the Conv2D layers being quantized using 4 bits while other layers follow the default setting.

In [7]:
edit_rules_list = [
    mct.core.network_editor.EditRule(
        filter=mct.core.network_editor.NodeTypeFilter(Conv2D),
        action=mct.core.network_editor.ChangeCandidatesWeightsQuantConfigAttr(attr_name='kernel', weights_n_bits=4)
    )
]

debug_config = mct.core.DebugConfig(network_editor=edit_rules_list)
core_config = mct.core.CoreConfig(debug_config=debug_config)

In this final part of the tutorial, we apply the customized quantization process to our Keras model.

By calling `quantize_keras_mct` with the `core_config` containing our edit rule, we specifically quantize the Conv2D layer using 4 bits, as per our custom configuration.

The `quantized_model` now reflects these changes. We then extract and display the number of bits used for quantization in both the Conv2D and Dense layers.

The output demonstrates the effect of our edit rule: the Conv2D layer is quantized with 4 bits while the Dense layer retains the default 8-bit quantization.

In [None]:
quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)
conv2d_layer = quantized_model.layers[2]
conv2d_nbits = conv2d_layer.weights_quantizers['kernel'].get_config()['num_bits']

dense_layer = quantized_model.layers[4]
dense_nbits = dense_layer.weights_quantizers['kernel'].get_config()['num_bits']

print(f"Conv2D nbits: {conv2d_nbits}, Dense nbits: {dense_nbits}")

## Edit Z-Threshold for Activation Quantization

In the context of model quantization, the Z-Threshold helps in handling outliers in the activation data. Outliers in the data can hurt the quantization process, leading to less efficient and potentially less accurate models.

The Z-Threshold is used to set a boundary, beyond which extreme values in the activation data are considered outliers and are not used to determine the quantization parameters. This approach effectively filters out extreme values, ensuring a more robust and representative quantization.

Adjusting the Z-Threshold can be particularly useful during the debugging and optimization of model quantization. By tweaking this parameter, you can fine-tune the balance between model accuracy and robustness against outliers in your specific use case.

A higher Z-Threshold means more data is considered during quantization, including some outliers, which might be necessary for certain models or datasets.

The following code demonstrates how you can customize the Z-Threshold for a specific layer type (Conv2D) in a Keras model using MCT's network editor functionality. This feature allows you to set different Z-Threshold values for different layers. By default, all layers use threshold of infinity (thus, no outlier-removal occurs).

In [None]:
z_threshold_target = 5
edit_rules_list = [
    mct.core.network_editor.EditRule(
        filter=mct.core.network_editor.NodeTypeFilter(Conv2D),
        action=mct.core.network_editor.ChangeCandidatesActivationQuantConfigAttr(z_threshold=z_threshold_target)
    )
]

debug_config = mct.core.DebugConfig(network_editor=edit_rules_list)
core_config = mct.core.CoreConfig(debug_config=debug_config)
quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)

Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
