<a href="https://colab.research.google.com/github/jvishnuvardhan/TF_Lite/blob/master/Post_training_quantization_Guides.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Post-training quantization Guides

Without degrading the model accuracy, Post-training quantization includes general techniques to reduce
*   CPU and hardware accelerator latency
*   processing
*   power, and
*   model size

These techniques can be performed on an already-trained float TensorFlow model and applied during TensorFlow Lite conversion. These techniques are enabled as options in the TensorFlow Lite converter.

Different post training quantization types
*   float quantized model
*   Post-training float16 quantization
*   Post-training dynamic range quantization
*   Post-training full integer quantization





**1. Quantizing weights**.  

Weights can be converted to types with reduced precision, such as 16 bit floats or 8 bit integers. We generally recommend 16-bit floats for GPU acceleration and 8-bit integer for CPU execution.

For example, here is how to specify 8 bit integer weight quantization:



At inference, the most critically intensive parts are computed with 8 bits instead of floating point. There is some inference-time performance overhead, relative to quantizing both weights and activations below.

In [0]:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE].  # specifies int8 weight quatization (activations still are float)
tflite_quant_model = converter.convert()

**2. Full integer quantization of weights and activations**.  
Improve latency, processing, and power usage, and get access to integer-only hardware accelerators by making sure both weights and activations are quantized. This requires a small representative data set.

In [0]:
import tensorflow as tf

def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen.    # this makes the activations to be converted to int8
tflite_quant_model = converter.convert()
# The resulting model will still take float input and output for convenience