# Export Quantized Keras Model

[Run this tutorial in Google Colab](https://colab.research.google.com/github/reuvenperetz/model_optimization/blob/change-keras-serial-enum/tutorials/notebooks/example_pytorch_export.ipynb)


To export a TensorFlow model as a quantized model, it is necessary to first apply quantization
to the model using MCT:





In [None]:
! pip install -q git+https://github.com/reuvenperetz/model_optimization.git@change-keras-serial-enum

In [None]:
import numpy as np
from keras.applications import ResNet50
import model_compression_toolkit as mct

# Create a model
float_model = ResNet50()
# Quantize the model. In order to export the model set new_experimental_exporter to True.
# Notice that here the representative dataset is random for demonstration only.
quantized_exportable_model, _ = mct.ptq.keras_post_training_quantization_experimental(float_model,
                                                                                      representative_data_gen=lambda: [np.random.random((1, 224, 224, 3))])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5




representative_data_gen generates a batch size of 1 which can be slow for optimization: consider increasing the batch size


100%|██████████| 1/1 [00:00<00:00,  1.25it/s]


Running quantization parameters search. This process might take some time, depending on the model size and the selected quantization methods.



Calculating quantization params:  69%|██████▉   | 86/125 [06:58<08:52, 13.66s/it]



### keras

The model will be exported as a tensorflow `.keras` model where weights and activations are quantized but represented using a float32 dtype.
Two optional quantization formats are available: MCTQ and FAKELY_QUANT.

#### MCTQ Quantization Format

By default, `mct.exporter.keras_export_model` will export the quantized Keras model to
a .keras model with custom quantizers from mct_quantizers module.




In [None]:
import tempfile

# Path of exported model
_, keras_file_path = tempfile.mkstemp('.keras')

# Export a keras model with mctq custom quantizers.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=keras_file_path)

Notice that the model has the same size as the quantized exportable model as weights data types are float.

#### Fakely-Quantized

In [None]:
# Path of exported model
_, keras_file_path = tempfile.mkstemp('.keras')

# Use mode KerasExportSerializationFormat.KERAS for a .keras model
# and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights
# and activations.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=keras_file_path,
                                quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)

Notice that the fakely-quantized model has the same size as the quantized exportable model as weights data types are
float.



### TFLite
The tflite serialization format export in two qauntization formats: INT8 and FAKELY_QUANT.

#### INT8 TFLite

The model will be exported as a tflite model where weights and activations are represented as 8bit integers.

In [None]:
import tempfile

# Path of exported model
_, tflite_file_path = tempfile.mkstemp('.tflite')

# Use mode KerasExportSerializationFormat.TFLITE for tflite model and quantization_format.INT8.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=tflite_file_path,
                                serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,
                                quantization_format=mct.exporter.QuantizationFormat.INT8)

Compare size of float and quantized model:


In [None]:
import os

# Save float model to measure its size
_, float_file_path = tempfile.mkstemp('.keras')
float_model.save(float_file_path)

print("Float model in Mb:", os.path.getsize(float_file_path) / float(2 ** 20))
print("Quantized model in Mb:", os.path.getsize(tflite_file_path) / float(2 ** 20))
print(f'Compression ratio: {os.path.getsize(float_file_path) / os.path.getsize(tflite_file_path)}')


#### Fakely-Quantized TFLite

The model will be exported as a tflite model where weights and activations are quantized but represented as float.
operators.

##### Usage Example

In [None]:
# Path of exported model
_, tflite_file_path = tempfile.mkstemp('.tflite')

# Use mode KerasExportSerializationFormat.TFLITE for tflite model and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights
# and activations.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=tflite_file_path,
                                serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,
                                quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)





Notice that the fakely-quantized model has the same size as the quantized exportable model as weights data types are
float.


Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
