# Embedded ML - Lab 2.2: TensorFlow Lite

In this lab you will learn the basics of TensorFlow Lite, a complement of TensorFlow that allows you to optimize and run models on constrained devices. It provides a much lighter runtime than TensorFlow but it only supports a subset of the tools available in full TensorFlow.

In this lab you might be given some helper functions but you are expected to write most of the code and be able to explain it at a high level of abstraction and also to modify any part of it.

### Learning outcomes


* Explain the basic concepts associated with TensorFlow Lite
* Develop applications following the basic TensorFlow Lite workflow
* Implement post-training quantization using TensorFlow Lite tools

In [102]:
# To run this notebook, locally as a jupyter notebook, you need to install thhe proper packages.
# follow the instructions below to set up your environment.


# 1. Crea un entorno virtual usando conda o venv
#    Por ejemplo, usando conda:
#        conda create -n [myenv] python=3.8
#        conda activate [myenv]
#    O usando venv:
#        python3 -m venv [myenv]
#        source [myenv]/bin/activate
#
# 2. Activa el entorno virtual
# 3. Instala los paquetes requeridos usando pip
# 4. Ejecuta el notebook


# Instala los paquetes requeridos:
%pip install numpy -q
%pip install pandas -q
%pip install matplotlib -q
%pip install tensorflow -q
%pip install scikit-learn -q
%pip install tensorflow-hub -q
%pip install tensorflow-datasets -q
%pip install tensorflow-estimator -q


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### TensorFlow Lite workflow
After having built a TensorFlow model, you can convert it to the TensorFlow Lite representation. Then you can run it with the TensorFlow Lite interpreter on your development environment before exporting it and copying it to the target device.

To run the model with TensorFlow Lite you should load the model to the TensorFlow Lite interpreter, allocate the input/output tensors, pass the input data and finally run inference. Notice that TensorFlow Lite API calls are different from those of TensorFlow.

In this part of the assignment, you should create and train a simple model (e.g. a one-neuron network) with TensorFlow and then save it. Then follow the TensorFlow Lite workflow until you are able to run inference and validate the outputs.

In [103]:

# import libraries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import pandas as pd

import os
import logging
import sys
import contextlib

# Suprimir los logs de TensorFlow
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
tf.get_logger().setLevel(logging.ERROR)

@contextlib.contextmanager
def suppress_stdout():
    with open(os.devnull, 'w') as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:
            yield
        finally:
            sys.stdout = old_stdout


In [None]:
#### TENSORFLOW BASIC WORKFLOW

# Create the model
# using the Xor example from the firts lab

x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])            # Input data for XOR
y_train = np.array([[0], [1], [1], [0]])                        # Output data for XOR       

model = tf.keras.Sequential([
    tf.keras.layers.Dense(2, input_shape=(2,)),             # Input layer with 2 neurons
    tf.keras.layers.Dense(4, activation='relu'),            # Hidden layer with 4 neurons and ReLU activation
    tf.keras.layers.Dense(1)                                # Output layer with 1 neuron (for binary output)    
])



# Compile the model
model.compile(optimizer='adam',
                loss='mean_squared_error',
                metrics=['accuracy'])

# Train the model

model.fit(x_train, y_train, epochs=400)                     # Train the model for 400 epochs with the XOR data
model.summary()                                             # Print the model summary to see the architecture and parameters

# Save the model to a file
model.save('xor_model.keras')                               # Save the trained model to a file named 'xor_model.keras'
accuracy = model.evaluate(x_train, y_train)                 # Evaluate the model on the training data to get the accuracy
print(f"Accuracy: {accuracy[1] * 100:.2f}%")


Epoch 1/400


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.2500 - loss: 0.3247
Epoch 2/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.2500 - loss: 0.3229
Epoch 3/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - accuracy: 0.2500 - loss: 0.3211
Epoch 4/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - accuracy: 0.2500 - loss: 0.3194
Epoch 5/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - accuracy: 0.2500 - loss: 0.3177
Epoch 6/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - accuracy: 0.2500 - loss: 0.3160
Epoch 7/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 0.2500 - loss: 0.3144
Epoch 8/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - accuracy: 0.5000 - loss: 0.3128
Epoch 9/400
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 173ms/step - accuracy: 1.0000 - loss: 0.0962
Accuracy: 100.00%


In [73]:
#### TENSORFLOR LITE BASIC WORKFLOW

# Load model
new_model = tf.keras.models.load_model('xor_model.keras')

# Convert model to TF Lite
converter = tf.lite.TFLiteConverter.from_keras_model(new_model)
tflite_model = converter.convert()

# Save TF Lite model to a file
import pathlib
tflite_model_file = pathlib.Path('xor_model.tflite')
tflite_model_file.write_bytes(tflite_model)
print(f"TF Lite model saved to {tflite_model_file}")

Saved artifact at '/tmp/tmpi779qvpv'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 2), dtype=tf.float32, name='input_layer_21')
Output Type:
  TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)
Captures:
  133988442109136: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987286654736: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987286662608: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987286651856: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987286653392: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987286652816: TensorSpec(shape=(), dtype=tf.resource, name=None)
TF Lite model saved to xor_model.tflite


W0000 00:00:1748215640.089462  510025 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1748215640.089635  510025 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-05-25 18:27:20.090017: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpi779qvpv
2025-05-25 18:27:20.090428: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-05-25 18:27:20.090434: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpi779qvpv
2025-05-25 18:27:20.094347: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-05-25 18:27:20.115955: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmpi779qvpv
2025-05-25 18:27:20.122442: I tensorflow/cc/saved_model/loader.cc:471] SavedModel load for tags { serve }; Status: success: OK. Took 32503 microseconds.


In [None]:
# load the model
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()                                          # Allocate memory for the model's tensors

# Get input and output tensors
input_details = interpreter.get_input_details()                         # Get details of the input tensor
output_details = interpreter.get_output_details()                       # Get details of the output tensor    

# Check the input and output shapes
input_data = np.array([[1, 0]], dtype=np.float32)                       # Create input data for the model (XOR input)
interpreter.set_tensor(input_details[0]['index'], input_data)           # Set the input tensor with the input data

# Run the model
interpreter.invoke()

# Get the output
output_data = interpreter.get_tensor(output_details[0]['index'])        # Get the output tensor from the model
print(f"Output for [1, 0]: {output_data}")


Output for [1, 0]: [[1.0135409]]


    TF 2.20. Please use the LiteRT interpreter from the ai_edge_litert package.
    See the [migration guide](https://ai.google.dev/edge/litert/migration)
    for details.
    


### Vision model with TensorFlow Lite

In this part of the assignment, you should import a small pre-trained model for a vision application that takes at most 1 MB. Then you should follow the TensorFlow Lite workflow until you are able to run inference and obtain the same results as with TensorFlow.

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

# Load and preprocess the dataset

def preprocess_data(image, label):
    """
    Preprocess the image and label for the Fashion MNIST dataset. 

    1. Converts the image data type to float32 and normalizes it to the range [0, 1].
    2. Expands the dimensions of the image to add a channel dimension, making it suitable for convolutional layers.

    Args:
        image (tf.Tensor): The input image tensor.
        label (tf.Tensor): The label tensor corresponding to the image.
    Returns:
        tuple: A tuple containing the preprocessed image tensor with a (28,  28, 1) shape and the label tensor.
    """
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.expand_dims(image, axis=-1)
    return image, label

# load the dataset 
(train_ds, test_ds) = tfds.load('fashion_mnist', split=['train', 'train'], as_supervised=True)      # Load the Fashion MNIST dataset

# With as_supervised=True, the dataset is returned as a tuple of (image, label) pairs and compatible with the preprocess_data function.

# Preprocess the dataset
train_ds = train_ds.map(preprocess_data).shuffle(1000).batch(32)
test_ds = test_ds.map(preprocess_data).batch(32)

# Create the model
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28, 1)),                       # Input layer with shape (28, 28, 1) for grayscale images
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu'),          # Convolutional layer with 16 filters and ReLU activation
    tf.keras.layers.MaxPooling2D(),                                 # Max pooling layer to reduce spatial dimensions 
    tf.keras.layers.Flatten(),                                      # Flatten the output to feed into the dense layer   
    tf.keras.layers.Dense(10, activation='softmax'),                # Dense layer with 10 units for classification (one for each class in Fashion MNIST)
])

# Compile the model
model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

# Train the model
model.fit(train_ds, epochs=3, validation_data=test_ds)


# Save the model to a file
model.save('fashion_mnist_model.keras')


Epoch 1/3
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 8ms/step - accuracy: 0.7840 - loss: 0.6388 - val_accuracy: 0.8831 - val_loss: 0.3391
Epoch 2/3
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 7ms/step - accuracy: 0.8850 - loss: 0.3318 - val_accuracy: 0.8980 - val_loss: 0.2899
Epoch 3/3
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 7ms/step - accuracy: 0.8977 - loss: 0.2943 - val_accuracy: 0.9056 - val_loss: 0.2659


In [None]:
# TFlite workflow

# Load the model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Guardar modelo TFLite
with open("fashion_mnist_model.tflite", "wb") as f:
    f.write(tflite_model)

# obtain one sample from the test dataset
test_image, test_label = next(iter(test_ds.unbatch().take(1)))                  # Get one sample from the test dataset with an python itterator               
input_data = tf.reshape(test_image, (1, 28, 28, 1)).numpy()                     # Reshape the image to match the input shape of the model and convert it to a numpy array

# Load the model
interpreter = tf.lite.Interpreter(model_path="fashion_mnist_model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])



Saved artifact at '/tmp/tmp9bolw070'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_181')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  133989357766608: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133989357765840: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133989357765456: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133989357773904: TensorSpec(shape=(), dtype=tf.resource, name=None)


W0000 00:00:1748215715.345938  510025 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1748215715.345956  510025 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-05-25 18:28:35.346168: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmp9bolw070
2025-05-25 18:28:35.346569: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-05-25 18:28:35.346586: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmp9bolw070
2025-05-25 18:28:35.349389: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-05-25 18:28:35.369975: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmp9bolw070
2025-05-25 18:28:35.377320: I tensorflow/cc/saved_model/loader.cc:471] SavedModel load for tags { serve }; Status: success: OK. Took 31154 microseconds.
    TF 2.20. Please use the LiteRT interprete

In [77]:
# Print the label
print(f"Label: {test_label.numpy()}")
# Print the output
print("Prediction class:", np.argmax(output_data))

keras_pred = model.predict(tf.expand_dims(test_image, axis=0))
print("Keras prediction class:", np.argmax(keras_pred))

Label: 2
Prediction class: 2
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step
Keras prediction class: 2


### Post-training quantization
Finally, in this part of the assignment you should activate quantization and convert the model again. Compare model size and accuracy of the compressed TensorFlow Lite model by using various configurations (investigate how) and against the uncompressed baseline.

In [None]:
# '========================================================================================================
# EVALUATE FUNCTIONS
# Those functions are used to evaluate the performance of the TFLite model on the Fashion MNIST dataset.
# =========================================================================================================


def preprocess(image, label):
    """
    Preprocess the image and label for the Fashion MNIST dataset.

    1. Converts the image data type to float32 and normalizes it to the range [0, 1].
    2. Expands the dimensions of the image to add a channel dimension, making it suitable for convolutional layers.

    Args:
        image (tf.Tensor): The input image tensor.
        label (tf.Tensor): The label tensor corresponding to the image.
    Returns:
        tuple: A tuple containing the preprocessed image tensor with a (28, 28, 1) shape and the label tensor.
    """
    image = tf.cast(image, tf.float32) / 255.0              # Normalize the image to the range [0, 1]
    image = tf.expand_dims(image, axis=-1)                  # Expand dimensions to add a channel dimension (28, 28, 1)
    return image, label


def evaluate_model(interpreter_path):
    """
    Evaluate the TFLite model on the Fashion MNIST test dataset.
    1. Loads the TFLite model from the specified path.
    2. Preprocesses the test dataset.
    3. Iterates through the test dataset, making predictions and comparing them to the true labels.
    4. Calculates and prints the accuracy of the model.
    Args:
        interpreter_path (str): The path to the TFLite model file.
    Returns:
        float: The accuracy of the model on the test dataset.
    """

    # Create and load the TFLite interpreter
    interpreter = tf.lite.Interpreter(model_path=interpreter_path)
    interpreter.allocate_tensors()                                  # Allocate memory for the model's tensors        

    
    input_details = interpreter.get_input_details()                 # Get details of the input tensor
    output_details = interpreter.get_output_details()               # Get details of the output tensor

    # Load and preprocess the Fashion MNIST test dataset
    test_ds = tfds.load('fashion_mnist', split='test', as_supervised=True)          # it loads as supervised, so it returns a tuple of (image, label)
    test_ds = test_ds.map(preprocess).batch(1)                                      # Preprocess the dataset and batch it to 1, as the model expects a single input at a time

    correct_predictions = 0
    total_predictions = 0


    for image, label in test_ds:    
        input_data = tf.reshape(image, (1, 28, 28, 1)).numpy()                      # Reshape the image to match the input shape of the model and convert it to a numpy array
        interpreter.set_tensor(input_details[0]['index'], input_data)               # Set the input tensor with the input data
        
        interpreter.invoke()                                                        # Run the model   
        output = interpreter.get_tensor(output_details[0]['index'])                 # Get the output tensor from the model
        predicted_label = np.argmax(output)                                         # Get the predicted label by finding the index of the maximum value in the output tensor

        if predicted_label == label.numpy()[0]:
            correct_predictions += 1
        total_predictions += 1

    # Calculate and print the accuracy
    accuracy = correct_predictions / total_predictions
    print(f"{interpreter_path}: Accuracy = {accuracy * 100:.2f}%")
    return accuracy


def evaluate_model_int8(interpreter_path):
    """
    Evaluate the TFLite model with INT8 quantization on the Fashion MNIST test dataset.
    1. Loads the TFLite model from the specified path.
    2. Preprocesses the test dataset.
    3. Iterates through the test dataset, making predictions and comparing them to the true labels.
    4. Calculates and prints the accuracy of the model.
    Args:
        interpreter_path (str): The path to the TFLite model file.
    Returns:
        float: The accuracy of the model on the test dataset.
    """

    # Create and load the TFLite interpreter
    interpreter = tf.lite.Interpreter(model_path=interpreter_path)
    interpreter.allocate_tensors()                                          # Allocate memory for the model's tensors

    input_details = interpreter.get_input_details()                         # Get details of the input tensor   
    output_details = interpreter.get_output_details()                       # Get details of the output tensor    

    # Obtain the quantization parameters for the input tensor
    input_scale, input_zero_point = input_details[0]['quantization']        


    correct_predictions = 0
    total_predictions = 0

    # Load and preprocess the Fashion MNIST test dataset
    test_ds = tfds.load('fashion_mnist', split='test', as_supervised=True)          # Load the Fashion MNIST test dataset
    test_ds = test_ds.batch(1)                                                      # Don't need a preprocess due it is already in the right format int8                                      

    for image, label in test_ds:
        # Convert the image to float32 and normalize it
        image = tf.cast(image, tf.float32) / 255.0

        # Expand dimensions to match the input shape of the model
        if image.shape[-1] == 1:
            image = tf.reshape(image, [1, 28, 28, 1])                           # Expand dimensions to add a channel dimension (28, 28, 1)

        # Convert the image to int8 format using the quantization parameters
        image = image / input_scale + input_zero_point
        image = tf.clip_by_value(image, -128, 127)                              # Clip (cut) the values to the range of int8 
        image = tf.cast(image, tf.int8).numpy()                                 # Convert the image to int8 format and convert it to a numpy array

        # Set the input tensor and invoke the interpreter
        interpreter.set_tensor(input_details[0]['index'], image)                
        interpreter.invoke()                                                     # Run the model
        output = interpreter.get_tensor(output_details[0]['index'])

        # Get the predicted label and compare it with the true label
        pred = np.argmax(output)
        correct_predictions += int(pred == label.numpy()[0])
        total_predictions += 1

    # Calculate and print the accuracy
    accuracy = correct_predictions / total_predictions
    print(f"INT8 Accuracy: {accuracy * 100:.2f}%")
    return accuracy


In [98]:
# Load the model previosly trained and saved
model = tf.keras.models.load_model('fashion_mnist_model.keras')



# ====================================
# 1. BASELINE CONVERSION
# ====================================
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model_fp32_model = converter.convert()

with open("Quantized models/fashion_mnist_model_fp32.tflite", "wb") as f:
    f.write(tflite_model_fp32_model)

size_fp32 = os.path.getsize("Quantized models/fashion_mnist_model_fp32.tflite")

# ====================================
# 2. QUANTIZATION FP16
# ====================================
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model_fp16 = converter.convert()

with open("Quantized models/fashion_mnist_model_fp16.tflite", "wb") as f:
    f.write(tflite_model_fp16)

size_fp16 = os.path.getsize("Quantized models/fashion_mnist_model_fp16.tflite")
# ====================================
# 3. QUANTIZATION INT8
# ====================================
raw_train_ds = tfds.load('fashion_mnist',split='train', as_supervised=True)

def representative_data_gen():
    for image, _ in raw_train_ds.batch(1).take(100):
        img, _ = preprocess(image, None)
        yield [img.numpy().reshape(1, 28, 28, 1)]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_int8 = converter.convert()
with open("Quantized models/fashion_mnist_model_int8.tflite", "wb") as f:
    f.write(tflite_model_int8)

size_int8 = os.path.getsize("Quantized models/fashion_mnist_model_int8.tflite")
# ====================================
# 4. QUANTIZATION DYNAMIC RANGE
# ====================================
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model_dynamic_range = converter.convert()
with open("Quantized models/fashion_mnist_model_dynamic_range.tflite", "wb") as f:
    f.write(tflite_model_dynamic_range)

size_dynamic_range = os.path.getsize("Quantized models/fashion_mnist_model_dynamic_range.tflite")



Saved artifact at '/tmp/tmpzn1yp498'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer_22')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  133987734993808: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993616: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734994192: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993424: TensorSpec(shape=(), dtype=tf.resource, name=None)
Saved artifact at '/tmp/tmpiysr05qp'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer_22')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  133987734993808: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993616: TensorSpec(shape=(), dtype=tf.resource, name=None)
  13398773

W0000 00:00:1748227678.393602  510025 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1748227678.393620  510025 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-05-25 21:47:58.393838: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpzn1yp498
2025-05-25 21:47:58.394208: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-05-25 21:47:58.394215: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpzn1yp498
2025-05-25 21:47:58.400437: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-05-25 21:47:58.417088: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmpzn1yp498
2025-05-25 21:47:58.422709: I tensorflow/cc/saved_model/loader.cc:471] SavedModel load for tags { serve }; Status: success: OK. Took 28876 microseconds.


Saved artifact at '/tmp/tmpxigtajdj'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer_22')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  133987734993808: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993616: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734994192: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993424: TensorSpec(shape=(), dtype=tf.resource, name=None)


W0000 00:00:1748227678.709640  510025 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1748227678.709666  510025 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-05-25 21:47:58.709881: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpiysr05qp
2025-05-25 21:47:58.710832: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-05-25 21:47:58.710839: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpiysr05qp
2025-05-25 21:47:58.718142: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-05-25 21:47:58.737785: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmpiysr05qp
2025-05-25 21:47:58.742884: I tensorflow/cc/saved_model/loader.cc:471] SavedModel load for tags { serve }; Status: success: OK. Took 33007 microseconds.
W0000 00:00:1748227679.062401  510025 tf_tfl_

Saved artifact at '/tmp/tmpe3ntqlg4'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer_22')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  133987734993808: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993616: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734994192: TensorSpec(shape=(), dtype=tf.resource, name=None)
  133987734993424: TensorSpec(shape=(), dtype=tf.resource, name=None)


W0000 00:00:1748227679.531237  510025 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1748227679.531272  510025 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-05-25 21:47:59.531563: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpe3ntqlg4
2025-05-25 21:47:59.532057: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-05-25 21:47:59.532072: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpe3ntqlg4
2025-05-25 21:47:59.535266: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-05-25 21:47:59.550258: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmpe3ntqlg4
2025-05-25 21:47:59.555988: I tensorflow/cc/saved_model/loader.cc:471] SavedModel load for tags { serve }; Status: success: OK. Took 24432 microseconds.


In [107]:
# =====================================
# EVALUATE THE MODELS
# =====================================

# List of models and their descriptions
ModelsCnn = [
    ['Original Model'],                         # FP32
    ['Fp16 Quantization'],                      # FP16
    ['INT8 Quantization'],                      # INT8
    ['Dynamic Range INT8-FP32']                 # Dynamic Range    
    
]

# list of model paths
model_paths = [
    "Quantized models/fashion_mnist_model_fp32.tflite",                 # FP32
    "Quantized models/fashion_mnist_model_fp16.tflite",                 # FP16
    "Quantized models/fashion_mnist_model_int8.tflite",                 # INT8
    "Quantized models/fashion_mnist_model_dynamic_range.tflite"         # Dynamic Range
]

# Evaluar los modelos
Acc = [
    evaluate_model(model_paths[0]),             # FP32
    evaluate_model(model_paths[1]),             # FP16
    evaluate_model_int8(model_paths[2]),        # INT8
    evaluate_model(model_paths[3])              # Dynamic Range
]

# Obtain the size of each model in KB
Size = [os.path.getsize(p) / 1024 for p in model_paths]

# Calculate the accuracy and size reduction for each model
data_list = []
for i in range(len(Acc)):
    Lost_Acc = ((Acc[0] - Acc[i]) / Acc[0]) * 100
    Lost_ModelS = ((Size[0] - Size[i]) / Size[0]) * 100
    data_list.append(ModelsCnn[i] + [Acc[i], Size[i], Lost_Acc, Lost_ModelS])

# Create a DataFrame to display the results
df = pd.DataFrame(
    data_list,
    columns=['Model', 'Accuracy', 'Size [KB]', 'Accuracy Reduction [%]', 'Size reduction [%]']
)
df.index = range(1, len(df) + 1)

# Display the DataFrame
df

    TF 2.20. Please use the LiteRT interpreter from the ai_edge_litert package.
    See the [migration guide](https://ai.google.dev/edge/litert/migration)
    for details.
    


Quantized models/fashion_mnist_model_fp32.tflite: Accuracy = 89.22%
Quantized models/fashion_mnist_model_fp16.tflite: Accuracy = 89.21%


2025-05-25 22:30:31.135934: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


INT8 Accuracy: 89.25%
Quantized models/fashion_mnist_model_dynamic_range.tflite: Accuracy = 89.24%


Unnamed: 0,Model,Accuracy,Size [KB],Accuracy Reduction [%],Size reduction [%]
1,Original Model,0.8922,108.625,0.0,0.0
2,Fp16 Quantization,0.8921,56.007812,0.011208,48.439298
3,INT8 Quantization,0.8925,29.867188,-0.033625,72.504315
4,Dynamic Range INT8-FP32,0.8924,29.484375,-0.022416,72.856732


# CONCLUSIONS

* As demonstrated in the previous lab, TensorFlow offers a comprehensive and user-friendly library for machine learning applications, enabling experimentation with a wide variety of configurations, architectures, and workflows. Moreover, this lab has shown that, when combined with TensorFlow Lite, it is possible to seamlessly convert models into optimized TensorFlow Lite formats and perform various types of quantization directly within the same framework. These optimized and quantized models are ideal for deployment on resource-constrained devices such as smartphones, single-board computers (SBCs), and other embedded systems.

* Respect the performance of the Quantized models, in the chart presented in this lab demonstrates that all quantization methods, effectively preserved model accuracy, with deviations within $±0.03\%$ of the original model $(0.8922)$, while significantly reducing size. FP16 achieved a $48.4\%$ reduction $(56.0 KB)$, whereas INT8 and Dynamic Range INT8-FP32 delivered superior compression of $~72.5–72.9\%$ $(29–30 KB)$, with INT8 even slightly improving accuracy $(0.8925)$. Given these results, INT8 quantization emerges as the optimal choice, because it offers the largest size reduction without compronmising performance,  making it ideal for resource-constrined applications. If hardware limitarions favior FP16, it remains as a vbiable alternative with moderate coompression and negligible accuracy loss. Overall, quantizaion proves highly efective for aoprimizibng this model's efficiency, 

