# Lab 3: Embedded AI on an Arduino 

## Intro

In this lab you will use the to **optimization**, **pruning** and **quantization** techniques from the previous labs using the **LiteRT** library (previously called Tensorflow Lite \[For Microcontrollers]). <br />
Next, you will **convert this model into a binary format** which can be compiled on the **Arduino Nano 33 BLE board**, which is provided to you. Here, you will be asked to **calculate the inference time and memory consumption.** 


To be able to run the necessary scripts throughout this lab, you will need access to a GPU. You can either **make use of your own GPU** (through a Linux or Windows WSL system, with a GPU-enabled tensorflow installed (version 2.18.0)) **or use Google Colab**. <br />To run notebooks in colab, you will need to download the lab folder on Ufora, **unzip it and put it on your Google Drive** (this folder will only be a few MBs in size). You can **drag and drop** the unzipped folder in your Google Drive.<br /><br />


Next, **double click on the provided .ipynb file** for each lab which will open Google Colab. <br />From there, fill in the necessary variables (such as the path to your Google Drive) and you will be able to **run and program the necessary code. Be sure te select a GPU under Runtime > Change runtime type.**

The **Arduino IDE** can be downloaded [here](https://www.arduino.cc/en/software) for Linux, Windows or MacOS systems.<br /> Next, you will need to put the Arduino TensorFlowLite library in [Documents]/Arduino/libraries/:

```
cd ~/Arduino/libraries or cd ~/Documents/Arduino/libraries/ or My Documents\Arduino\Libraries
git clone https://github.com/tensorflow/tflite-micro-arduino-examples Arduino_TensorFlowLite
```

In [1]:
%pip install --user --upgrade tensorflow-model-optimization
%pip install tf_keras

# Click Runtime > Restart session
# This ensures the above installed libraries are correctly imported

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Run this code to connect your Google Drive to Colab

# from google.colab import drive
# drive.mount('/content/drive')

In [2]:
# Change to your project directory
path_to_lab = ""

## Functions
Below you can find **functions** which can be used to complete the lab. <br />
_Note: when running the below code for the first time on Google Colab, you will get a warning that you need to restart your runtime session. This is expected because the kernel needs to use the expected tensorflow version._ 

In [3]:
import tensorflow as tf
from tensorflow import keras as keras
%pip install --user --upgrade tensorflow-model-optimization
%pip install tf_keras==2.16.0
import tensorflow_model_optimization as tfmot
import numpy as np
from sklearn.metrics import accuracy_score, classification_report


def mnist_model(train=False):
    model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28)),
    tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(filters=64, kernel_size=(6, 6), activation=tf.nn.relu, name="conv1"),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation=tf.nn.relu, name="conv2"),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(16, activation=tf.nn.relu, name="dense1"),
    # tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax, name="dense2")
    ])

    model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

    if train:
        model.fit(x=train_images, y= train_labels, batch_size=64, epochs=10, validation_data=(test_images, test_labels))
    else:
        # model = tf.keras.models.load_model("Models/mnist.keras")
        model = tf.keras.models.load_model(path_to_lab + "Models/mnist")
    return model

2025-05-09 17:30:53.229696: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-09 17:30:53.298401: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-09 17:30:53.375522: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-09 17:30:53.452047: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-09 17:30:53.452600: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-09 17:30:53.563259: I tensorflow/core/platform/cpu_feature_guard.cc:

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Part 1: optimize models using LiteRT

1) Start from the pruned (first three layers, 85%) + INT8 quantized model of the last lab and save the model.
2) Re-evaluate the model without any LiteRT optimizations applied (baseline performance) and with pruning + quantization (accuracy performance after optimizations). This should be similar as previous lab (>90%). If not, contact the instructor during the lab.

In [4]:
# from code lab1: helper function to verify performance of tflite model
def verify_performance(model_path):
    # Load TFLite model and allocate tensors.
    interpreter = tf.lite.Interpreter(model_path=path_to_lab + model_path)
    interpreter.allocate_tensors()

    # Get input and output tensors.
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    # Test model on random input data.
    # input_shape = input_details[0]['shape']
    # test_image = test_images[0].astype(np.float32)
    # test_image = np.expand_dims(test_image, axis=0)

    # interpreter.set_tensor(input_details[0]['index'], test_image)
    # interpreter.invoke()

    # output_data = interpreter.get_tensor(output_details[0]['index'])
    # predicted_label = np.argmax(output_data)

    correct = 0
    for i in range(len(test_images)):

        # change type of array elements form UINT to float32
        test_image = (test_images[i] - 128).astype(np.int8)
        # change shape of test img to be batch of lenght 1
        test_image = np.expand_dims(test_image, axis=0)

        # input test_image
        interpreter.set_tensor(input_details[0]['index'], test_image)

        # run model
        interpreter.invoke()

        # get result
        output_data = interpreter.get_tensor(output_details[0]['index'])

        if np.argmax(output_data) == test_labels[i]:
            correct += 1

    accuracy = correct / len(test_images)
    model_name = model_path.split("/")[-1]
    print(f"TFLite Model ({model_name}) Accuracy: {accuracy:.4f}")

# helper functions to check diff in models
def print_model_details(model_path):
    interpreter = tf.lite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()

    model_name = model_path.split("/")[-1]
    print("\n")
    print(f"Model: {model_name}")
    print(f"Number of tensors: {len(interpreter.get_tensor_details())}")
    print(f"Number of ops: {len(interpreter.get_signature_list())}")

    for tensor in interpreter.get_tensor_details()[0:3]:
        print(f"Tensor Name: {tensor['name']}, Shape: {tensor['shape']}, Type: {tensor['dtype']}")
    print("\n")

def check_weight_types(model_path):
    interpreter = tf.lite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()

    tensor_details = interpreter.get_tensor_details()
    weight_types = {tensor['dtype'] for tensor in tensor_details}

    model_name = model_path.split("/")[-1]
    print(f"Model: {model_name}")
    print(f"Weight data types: {weight_types}")
    print("\n")

import zipfile
import os

def zip_model(model_path, output_zip_path=None):
    """
    Zips a model file and saves it to the specified output path.
    
    Args:
        model_path (str): Path to the model file to be zipped
        output_zip_path (str, optional): Path for the output zip file. 
                        If None, uses model_path + '.zip'
    """
    if output_zip_path is None:
        output_zip_path = model_path + '.zip'
    
    with zipfile.ZipFile(output_zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        zipf.write(model_path, os.path.basename(model_path))
    
    print(f"Model zipped to: {output_zip_path}")
    print(f"Original size: {os.path.getsize(model_path)} bytes")
    print(f"Zipped size: {os.path.getsize(output_zip_path)} bytes")
    return output_zip_path

In [5]:
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Part 1: Start from the pruned + INT8 quantized model from the previous lab
train_images = train_images.astype(np.float32)
test_images = test_images.astype(np.float32)

verify_performance('Models/mnist_pruned85_quantint8.tflite')

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


TFLite Model (mnist_pruned85_quantint8.tflite) Accuracy: 0.9886


## Part 2: Covert the model in binary format

3) Covert LiteRT model to the binary format ready to be compiled on your Arduino Nano 33 BLE.

In [10]:
! xxd -i Models/mnist_pruned85_quantint8.tflite > arduino/mnist_pruned85_quantint8.cc

**Q1: What command did you use and how does the resulting file content look like?**
- Used command: `xxd -i Models/mnist_pruned85_quantint8.tflite > arduino/mnist_pruned85_quantint8.cc`
- File contains a C array, named `Models_mnist_pruned85_quantint8_tflite[]`. I contains the binary data of the TfLite model represented in hex values. It also contains a variable `Models_mnist_pruned85_quantint8_tflite_len` with value `70688`

4) Print a C array of a few input examples (from the test dataset) to be included in a testsamples.h file during compile time so that the model can be tested on real examples at your Arduino Nano 33 BLE. 

In [6]:
def convert_array_to_carray(array, name):
    c_array = f"const signed char {name}[{array.size}] = {{\n  "
    flat_list = array.flatten()
    c_array += ", ".join(str(x) for x in flat_list)
    c_array += "\n};\n"

    return c_array    

In [7]:
num_samples = 5
model_path = "Models/mnist_pruned85_quantint8.tflite"
selected_images = test_images[:num_samples]

# get quantization parameters
interpreter = tf.lite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
scale, zero_point = input_details[0]['quantization']

# flatten and convert images
flattened_test_images = selected_images.reshape(selected_images.shape[0], -1)
int8_test_images = (flattened_test_images / scale - zero_point).astype(np.int8)

# print 
with open("arduino/testsamples.h", "w") as f:

    f.write(f"const int num_samples {num_samples} \n")
    for i in range(num_samples):
        array_name = f"test_sample_{i}"
        c_array_str = convert_array_to_carray(int8_test_images[i], array_name)
        f.write(f"// Label: {test_labels[i]}\n") # to add label comment
        f.write(c_array_str + "\n")

**Q2: How did you create this array and what is the output?**
1. Quantized, scale and zero-point to match TFLite model (obtained from interpreter.get_input_details())
1. For every image:
    1. Flatten image into 1D array
    1. Added `f"const signed char test_sample_{i}[{array.size}] = {{\n  " # to set the var name, size and type
    1. Added each value of flattend array, joined with: ", " # to actually add the values
    1. Finished with "\n};\n" # finish array
    1. Add result string to `testsamples.h` file

example output for 1 array:
```
const signed char test_sample_4[784] = {
    -128, -128, ...
}
```

The size of `784` confirms we have the flattend arrays of `27x27`

## Part 3: Test the model on the Arduino
5) Make a main file that uses the tensorflow model.h, uses the test samples (.h)
6) Be sure to Git clone this library: https://github.com/tensorflow/tflite-micro-arduino-examples and put in under Arduino/libraries
7) Run inference on the device, make sure to time the inference time and afterwards also report the memory consumption


**Q3: Copy-past the code used to perform model inference on the Arduino and how you timed your results** </br>

```c
int inference(const signed char* test_sample) {
  // copy sample to input tensor
  for (int i = 0; i < 784; i++) {
    input->data.int8[i] = test_sample[i];
  }
  
  // for inference time measurement
  unsigned long start_time = micros();
  
  // actual inference
  TfLiteStatus invoke_status = interpreter->Invoke();
  
  // end and calc inference time measurement
  unsigned long end_time = micros();
  unsigned long inference_time = end_time - start_time;
  
  if (invoke_status != kTfLiteOk) {
    Serial.println("Invoke failed");
    return -1;
  }
  
  // extracting output result
  int8_t* output_data = output->data.int8;
  
  // search index with highest probability to print prediction
  int predicted_digit = 0;
  int8_t max_score = output_data[0];
  
  for (int i = 1; i < 10; i++) {
    if (output_data[i] > max_score) {
      predicted_digit = i;
      max_score = output_data[i];
    }
  }
  
  // logging inference duration
  Serial.print("Inference time: ");
  Serial.print(inference_time);
  Serial.println(" ms");
  
  return predicted_digit;
}
```
&rarr; I was not sure if the timing should also include copying the input sensor and finding the actual diget so I did both.

**Q4: What inference time do you get?**
```
Test Sample 0:
Inference time: 268179 microseconds
Prediction time: 268323 microseconds
Predicted digit: 7

Test Sample 1:
Inference time: 268004 microseconds
Prediction time: 268165 microseconds
Predicted digit: 2

Test Sample 2:
Inference time: 268007 microseconds
Prediction time: 268169 microseconds
Predicted digit: 1

Test Sample 3:
Inference time: 268136 microseconds
Prediction time: 268281 microseconds
Predicted digit: 0

Test Sample 4:
Inference time: 268031 microseconds
Prediction time: 268176 microseconds
Predicted digit: 4
```
&rarr; So no real difference in Inference and Prediction

**Q5: What memory consumption of the model on the arduino did you measure? Did you need to change anything to the allocated tensor memory to accommodate the model size?** </br>

```
Free memory before model initialization: -8041 bytes
Input tensor dimensions: 3 dimensions with shape: [1, 28, 28]
Free memory after model initialization: -8041 bytes
Memory consumed by model: 69792 bytes
```

The sketch exeeced the memory of the arduino (hence the `-8041 bytes` reading), but there were no OOM errors. So I did not change anything to allocate tensor memory.
