In [None]:
[Run this tutorial in Google Colab](https://colab.research.google.com/github/reuvenperetz/model_optimization/blob/add_tf_notebook/tutorials/keras_notebook/keras_notebook.ipynb).


## Overview

In this tutorial, you will see how to quantize a Keras model using MCT.
More specifically:

1. Train a simple `tf.keras` model for MNIST.
2. Quantize the model to 8-bits model using MCT.
3. Evaluate the models and compare results.

This tutorial demonstrates a simple 8-bits quantization scheme. For more advanced quantization options, see the [API documentation](https://sony.github.io/model_optimization/api/api_docs/index.html).

For more details, see [HPTQ](https://arxiv.org/abs/2109.09113).

## Setup
Install packages and import them:

In [None]:
! pip install model-compression-toolkit
! pip install -q tensorflow
! pip install -q tensorflow-model-optimization

In [None]:
import model_compression_toolkit as mct
import numpy as np
import tensorflow as tf
from tensorflow import keras

## Train a model for MNIST
We will start by training a simple Keras model for MNIST.
This code is based on a [Tensorflow tutorial](https://www.tensorflow.org/model_optimization/guide/quantization/training_example).

In [None]:
# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_split=0.1,
)


Evaluate the pretrained model:

In [None]:
_, float_model_accuracy = model.evaluate(test_images, test_labels)

print('Float model evaluation accuracy:', float_model_accuracy)

## Quantize model using MCT

First, MCT needs to retrieve a function to use in order to get images that represent
the dataset the model was trained from for calibration purposes. The function should be called without any arguments, and should return a list numpy arrays (array for each
model's input).

Here for example, the model has a single input of a shape of [28 X 28 X 1] and we calibrate the model using batches of single images.
Calling representative_data_gen() should return a list
of a numpy.ndarray of shape [(1, 28, 28, 1)].

In [None]:
def representative_data_gen() -> list:
    sample = train_images[np.random.randint(0,len(train_images))]
    return [np.expand_dims(sample, axis=0)]

Now we can call keras_post_training_quantization from MCT to quantize the
model to 8-bits model. By default, [keras_post_training_quantization](https://sony.github.io/model_optimization/api/api_docs/methods/keras_post_training_quantization.html#ug-keras-post-training-quantization) uses 500 iterations of statistics collection, but fewer iterations can be used:

In [None]:
num_calibration_iterations = 1
quantized_model, _ = mct.keras_post_training_quantization(model,
                                                          representative_data_gen,
                                                          n_iter=num_calibration_iterations)


That is all! We got now a Keras model with quantized weights and activations.
Note that the weights are fake-quantized and not integers, thus we approximate the expected model size.
We can see that the expected model's size is ___ . Which is 4 times smaller than the original float model.
Let's evaluate the quantized model:

In [None]:
# Train the digit classification model
quantized_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
_, quantized_model_accuracy = quantized_model.evaluate(test_images, test_labels)

print('Quantized model evaluation accuracy:', quantized_model_accuracy)


We can see the accuracy was slightly dropped

In [None]:
num_calibration_iterations = 1
quantized_model, _ = mct.keras_post_training_quantization(model,
                                                          representative_data_gen,
                                                          n_iter=num_calibration_iterations)


That is all! We got now a Keras model with quantized weights and activations.
We can see that the expected model's size is ___ . Which is 4 times smaller than the original float model.
Let's evaluate it:

In [None]:
# Train the digit classification model
quantized_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
_, quantized_model_accuracy = quantized_model.evaluate(test_images, test_labels)

print('Quantized model evaluation accuracy:', quantized_model_accuracy)


We can see the accuracy was dropped from __ to __ when the model size is X4 times smaller.