# Model Optimization Notebook

This Jupyter Notebook demonstrates the process of preparing image data, downloading the COCO dataset, preprocessing images, and converting a YOLO NAS ONNX model to various SNPE DLC formats for deployment. The workflow includes:

- Importing necessary libraries for image processing and file management.
- Downloading and extracting a subset of the COCO validation dataset.
- Preprocessing images to the required input format for model inference.
- Converting the YOLO NAS ONNX model to SNPE DLC format, including quantization and graph preparation for specific hardware targets.
- Documenting each step for reproducibility and clarity.

This notebook serves as a practical guide for deploying deep learning models on Qualcomm platforms using the SNPE toolkit.

## How to Use

1. **Build and start the Docker Compose environment** as described in the project documentation.
2. **Access this notebook** in your browser at:  
    [http://127.0.0.1:8888/notebooks/model_optimization.ipynb](http://127.0.0.1:8888/notebooks/model_optimization.ipynb)
3. **Run all cells** in order to optimiza the ONNX model to Qualcomm chipsets.

In [1]:
# Import necessary libraries.
import glob
import os
import random
import uuid
import torch

import cv2 as cv
import numpy as np
import tensorflow as tf

2025-05-31 21:21:51.521210: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-31 21:21:51.586424: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-31 21:21:51.590006: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/.miniconda3/envs/snpe/lib/python3.10/site-packages/cv2/../../lib64:::/root/.miniforge3//lib/:/qairt/2

## Data cleaning.

The `preprocess` function resizes an input image to 320x320 pixels, normalizes its pixel values to the range [0, 1], and returns the processed image as a NumPy array of type float32, preparing it for model inference.

In [2]:
def preprocess(original_image: np.ndarray, size: int = 320) -> np.ndarray:
    """
    Preprocess the input image for model inference.

    Args:
        original_image (np.ndarray): The input image in BGR format.
        size (int): The target size for resizing the image. Default is 320.

    Returns:
        np.ndarray: The preprocessed image in the format expected by the model.
    """

    # Resize the image to (size X size) pixels and normalize pixel values to
    # [0, 1].
    resized_image = cv.resize(original_image, (size, size))
    return (resized_image / 255.).astype(np.float32)

### Getting the coco dataset
The COCO (Common Objects in Context) dataset is a large-scale image dataset designed for object detection, segmentation, and captioning tasks. In this pipeline, we use a subset of the COCO validation images to test and optimize our deep learning model. The images are downloaded, preprocessed, and converted into a raw format suitable for model inference and quantization steps. This ensures that the model is evaluated and optimized using real-world, diverse data representative of common objects and scenes.

In [3]:
if not os.path.exists("val2017.zip"):
    !wget http://images.cocodataset.org/zips/val2017.zip -q --show-progress

if not os.path.exists("val2017"):
    !unzip val2017.zip

if not os.path.exists("val.zip"):
    !wget https://huggingface.co/datasets/testdummyvt/hagRIDv2_512px_10GB/resolve/main/val.zip -q --show-progress

if not os.path.exists("val"):
    !unzip val.zip

if not os.path.exists("raw"):
    !mkdir "raw"
    !mkdir "raw/320"
    !mkdir "raw/640"

    random.seed(42)
    for size, folder in zip([320, 640], ["val2017", "val"]):
        filenames = glob.glob(f"{folder}/**/*.jpg", recursive=True)
        random.shuffle(filenames)

        for filename in filenames[:100]:
            image = cv.imread(filename)

            # Preprocess the image and save it in raw format.
            normalized_image = preprocess(image, size=size)
            normalized_image.tofile(f"raw/{size}/{uuid.uuid4()}.raw")

!zsh -c 'find raw/320 -name "*.raw" > ./raw/320/input.txt'
!zsh -c 'find raw/640 -name "*.raw" > ./raw/640/input.txt'

## Model Optimization (YOLO-NAS S)
This section covers the process of optimizing a deep learning model for deployment on Qualcomm® chipsets using the Qualcomm® Neural Processing SDK for AI (SNPE). The workflow includes converting a YOLO NAS ONNX model to the SNPE DLC format, quantizing the model for efficient inference, and preparing the model for specific hardware targets.

### 1. Model Conversion

The first step is to convert the ONNX model to the SNPE Deep Learning Container (DLC) format. This is achieved using the `snpe-onnx-to-dlc` tool, which translates the ONNX model into a format compatible with Qualcomm® hardware accelerators.

**Command:**
```
snpe-onnx-to-dlc -i /models/yolo_nas_s.onnx -o /models/yolo_nas_s_fp32.dlc
```

In [4]:
!zsh -c 'snpe-onnx-to-dlc -i /models/yolo_nas_s.onnx -o /models/yolo_nas_s_fp32.dlc'

2025-05-31 21:21:53,488 - 270 - INFO - Input shape info 
2025-05-31 21:21:55,733 - 270 - INFO - INFO_INITIALIZATION_SUCCESS: 
2025-05-31 21:21:56,337 - 270 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2025-05-31 21:21:56,348 - 270 - INFO - INFO_WRITE_SUCCESS: 


### 2. Model Inspection

After conversion, the `snpe-dlc-info` tool is used to inspect the DLC file. This step ensures that the model has been correctly converted and provides information about the model's input and output tensors.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_fp32.dlc
```

In [5]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_fp32.dlc'

DLC info of: /models/yolo_nas_s_fp32.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; unroll_gru_time_steps=True; quantization_overrides=; prepare_inputs_as_params=False; perform_axes_to_spatial_first_order=True; unroll_lstm_time_steps=True; packed_max_seq=1; validation_target=[]; packed_masked_softmax_inputs=[]; preserve_io=[]; debug=-1; dump_qairt_io_config_yaml=; multi_time_steps_lstm=False; multi_time_steps_gru=False; preserve_onnx_output_order=False; perform_layout_transformation=False; keep_quant_nodes=False; input_dtype=[]; keep_disconnected_nodes=False; input_encoding=[]; handle_gather_negative_indices=True; copyright_file=None; force_prune_cast_ops=False; package_name=None; use_convert_quantization_nodes=False; defer_loading=False; dry_run=None; input_type=[]; float_bitwidth=32; out_names=['output_bboxes', 'output_classes']; float_bw=32; model_version=N

### 3. Model Quantization

Quantization reduces the model size and increases inference speed by converting floating-point weights to 8-bit integers. The `snpe-dlc-quantize` tool uses a calibration dataset (prepared in the previous steps) to optimize the model for INT8 precision.

**Command:**
```
snpe-dlc-quantize --input_dlc /models/yolo_nas_s_fp32.dlc --input_list ./raw/input.txt --output_dlc /models/yolo_nas_s_int8.dlc
```

In [6]:
!zsh -c 'snpe-dlc-quantize --input_dlc /models/yolo_nas_s_fp32.dlc --input_list ./raw/320/input.txt --output_dlc /models/yolo_nas_s_int8.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] Processed command-line arguments
     0.4ms [  INFO ] Inferences will run in sync mode
Processing inference input(s):
raw/320/71184bc4-9a65-4234-80a7-56bda329a03b.raw
raw/320/cdf6dfcd-5d5d-4f6f-b6f9-8d8555e83958.raw
raw/320/ae764ef4-93a0-4f69-9655-dbd57e98b866.raw
raw/320/2c5a7b1e-332e-47c7-a35f-4c2f05096e27.raw
raw/320/0ad064af-d2af-4635-bcea-83cb372d094f.raw
raw/320/9b3794ef-be81-41d1-b70d-902f2456c906.raw
raw/320/5f236cd9-f7c2-4949-ab61-3aa3b872e25e.raw
raw/320/8293e765-e3db-4c9a-bd9d-8eb9f0170d19.raw
raw/320/4f0b3c72-fe85-45b7-a12a-b18a6f1450c9.raw
raw/320/0f8264b3-ef53-472e-b266-003be256f2e5.raw
raw/320/fc9b59cd-71d5-4052-b54a-f342649d25fd.raw
raw/320/b6f50102-fca1-4625-8b3a-3794dfae96e2.raw
raw/320/570d170a-1a73-4f98-b824-e63cfaf70e12.raw
raw/320/983405a1-f276-49a1-9390-bed91f50edd3.raw
raw/320/374f120d-4217-4153-9088-ea6e591b6893.raw
raw/320/e773da2a-df3f-47c2-975e-0911abc44293.raw
raw/320/343d7af9-a87b-4235-9e53-b531d4097d58

### 4. Post-Quantization Inspection

After quantization, the model is inspected again to verify the changes and ensure the quantized model is ready for deployment.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_int8.dlc
```

In [7]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_int8.dlc'

DLC info of: /models/yolo_nas_s_int8.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; dump_inferred_model=False; input_dim=None; converter_op_package_lib=; expand_lstm_op_structure=False; adjust_nms_features_dims=True; define_symbol=None; squash_box_decoder=True; inject_cast_for_gather=True; match_caffe_ssd_to_tf=True; custom_io=; enable_match_topk=False; apply_masked_softmax=uncompressed; dumpIR=False; no_simplification=False; dump_custom_io_config_template=; dump_value_info=False; expand_gru_op_structure=True; input_layout=[]; align_matmul_ranks=True; preprocess_roi_pool_inputs=True; custom_op_config_paths=None; enable_framework_trace=False; batch=None; keep_int64_inputs=False; float_bias_bw=0; extract_color_transform=True; model_version=None; float_bias_bitwidth=0; float_bw=32; out_names=['output_bboxes', 'output_classes']; float_bitwidth=32; dry_run=None; us

### 5. Hardware-Specific Graph Preparation

To further optimize the model for a specific Qualcomm® SoC (e.g., SM7325), the `snpe-dlc-graph-prepare` tool is used. This step configures the model's output tensors and prepares it for execution on the target hardware's HTP (Hexagon Tensor Processor).

**Command:**
```
snpe-dlc-graph-prepare --input_dlc /models/yolo_nas_s_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_nas_s_int8_htp_sm7325.dlc
```

In [8]:
!zsh -c 'snpe-dlc-graph-prepare --input_dlc /models/yolo_nas_s_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_nas_s_int8_htp_sm7325.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM7325
[USER_INFO] Target device backend record identifier: HTP_V68_SM7325_2MB
[USER_INFO] No cache record in the DLC matches the target device (HTP_V68_SM7325_2MB). Creating a new record
[USER_INFO] Checking unsigned PD session
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x1f8d7bc0
[INFO] Found Interface Provider (v2.25)
[USER_INFO] Platform option not set
[USER_INFO] Created ctx=0x1 for Graph Id=0 backend=HTP SNPE Id=0x1f73f728
[USER_INFO] Context [0x1] Setting priority to: default
[USER_INFO] Offline Prepare VTCM size(MB) selected = 0
[USER_INFO] Offline Prepare Optimization Level passed = 2

spill_bytes=2191360
fill_bytes=2191360
write_total_bytes=2371584
read_total_bytes=18309120

[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context handle=0x1 for Graph Id=0 backend=HTP SN

### 6. Final Model Inspection

A final inspection confirms that the model is correctly prepared for the target hardware and ready for deployment.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_int8_htp_sm7325.dlc
```

In [9]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_int8_htp_sm7325.dlc'

DLC info of: /models/yolo_nas_s_int8_htp_sm7325.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; dump_inferred_model=False; input_dim=None; converter_op_package_lib=; expand_lstm_op_structure=False; adjust_nms_features_dims=True; define_symbol=None; squash_box_decoder=True; inject_cast_for_gather=True; match_caffe_ssd_to_tf=True; custom_io=; enable_match_topk=False; apply_masked_softmax=uncompressed; dumpIR=False; no_simplification=False; dump_custom_io_config_template=; dump_value_info=False; expand_gru_op_structure=True; input_layout=[]; align_matmul_ranks=True; preprocess_roi_pool_inputs=True; custom_op_config_paths=None; enable_framework_trace=False; batch=None; keep_int64_inputs=False; float_bias_bw=0; extract_color_transform=True; model_version=None; float_bias_bitwidth=0; float_bw=32; out_names=['output_bboxes', 'output_classes']; float_bitwidth=32; dry_r

## Model Optimization (YOLO-hagRID)
This section details the optimization workflow for the YOLO-hagRID model, following the same steps as with YOLO-NAS S. The process includes converting the ONNX model to SNPE DLC format, quantizing for efficient inference, and preparing the model for deployment on Qualcomm® hardware. Each step ensures the model is tailored for high performance and low latency on edge devices.

### 1. Model Conversion

**Command:**
```
snpe-onnx-to-dlc -i /models/yolo_hagRID.onnx -o /models/yolo_hagRID_fp32.dlc
```

In [10]:
!zsh -c 'snpe-onnx-to-dlc -i /models/yolo_hagRID.onnx -o /models/yolo_hagRID_fp32.dlc'

2025-05-31 21:22:27,783 - 270 - INFO - Input shape info 
2025-05-31 21:22:29,759 - 270 - INFO - INFO_INITIALIZATION_SUCCESS: 
2025-05-31 21:22:29,917 - 270 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2025-05-31 21:22:29,925 - 270 - INFO - INFO_WRITE_SUCCESS: 


### 2. Model Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_fp32.dlc
```

In [11]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_fp32.dlc'

DLC info of: /models/yolo_hagRID_fp32.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; unroll_gru_time_steps=True; quantization_overrides=; prepare_inputs_as_params=False; perform_axes_to_spatial_first_order=True; unroll_lstm_time_steps=True; packed_max_seq=1; validation_target=[]; packed_masked_softmax_inputs=[]; preserve_io=[]; debug=-1; dump_qairt_io_config_yaml=; multi_time_steps_lstm=False; multi_time_steps_gru=False; preserve_onnx_output_order=False; perform_layout_transformation=False; keep_quant_nodes=False; input_dtype=[]; keep_disconnected_nodes=False; input_encoding=[]; handle_gather_negative_indices=True; copyright_file=None; force_prune_cast_ops=False; package_name=None; use_convert_quantization_nodes=False; defer_loading=False; dry_run=None; input_type=[]; float_bitwidth=32; out_names=['output_bboxes', 'output_classes']; float_bw=32; model_version=

### 3. Model Quantization

**Command:**
```
snpe-dlc-quantize --input_dlc /models/yolo_hagRID_fp32.dlc --input_list ./raw/640/input.txt --output_dlc /models/yolo_hagRID_int8.dlc
```

In [12]:
!zsh -c 'snpe-dlc-quantize --input_dlc /models/yolo_hagRID_fp32.dlc --input_list ./raw/640/input.txt --output_dlc /models/yolo_hagRID_int8.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] Processed command-line arguments
     0.6ms [  INFO ] Inferences will run in sync mode
Processing inference input(s):
raw/640/e8668bcc-a2dd-485a-bbeb-69d92a088f34.raw
raw/640/27dde55b-32d7-4865-a48c-57d9c56270be.raw
raw/640/b6f13561-fa4f-41a8-8f81-88e38f3513c1.raw
raw/640/304d114e-5d0e-48d9-b3b7-cd391a0b6161.raw
raw/640/c1e8d92c-93c5-431d-b554-53177f4c7a0e.raw
raw/640/77bd2c24-2111-4592-8e5f-ed8f37cda16c.raw
raw/640/9d2b0622-3011-4179-a9e3-e4e5170e5e3a.raw
raw/640/c016fe02-135b-4a4e-a009-ebac7e35877b.raw
raw/640/14e01a48-456a-4047-a27a-e7791d9cb7a6.raw
raw/640/39a3c245-e5a1-4ae0-a78d-928817477353.raw
raw/640/bc6066e1-7e26-4e3e-a297-668bf8e80c6d.raw
raw/640/5f0d858d-0afb-4ceb-8a72-11679bf0ba0b.raw
raw/640/113f0698-ee1b-449d-9507-aee025035f5f.raw
raw/640/46ae7fc5-5a51-4c94-aeae-1707e95663d1.raw
raw/640/994cd529-0184-4543-ab5c-1c417aac9aeb.raw
raw/640/6bbd446f-9962-4b8c-b358-a76ced2b2fc3.raw
raw/640/3780aeb4-86ad-4f41-9925-470d60ca8422

### 4. Post-Quantization Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_int8.dlc
```

In [13]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_int8.dlc'

DLC info of: /models/yolo_hagRID_int8.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; dump_inferred_model=False; input_dim=None; converter_op_package_lib=; expand_lstm_op_structure=False; adjust_nms_features_dims=True; define_symbol=None; squash_box_decoder=True; inject_cast_for_gather=True; match_caffe_ssd_to_tf=True; custom_io=; enable_match_topk=False; apply_masked_softmax=uncompressed; dumpIR=False; no_simplification=False; dump_custom_io_config_template=; dump_value_info=False; expand_gru_op_structure=True; input_layout=[]; align_matmul_ranks=True; preprocess_roi_pool_inputs=True; custom_op_config_paths=None; enable_framework_trace=False; batch=None; keep_int64_inputs=False; float_bias_bw=0; extract_color_transform=True; model_version=None; float_bias_bitwidth=0; float_bw=32; out_names=['output_bboxes', 'output_classes']; float_bitwidth=32; dry_run=None; u

### 5. Hardware-Specific Graph Preparation

**Command:**
```
snpe-dlc-graph-prepare --input_dlc /models/yolo_hagRID_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_hagRID_int8_htp_sm7325.dlc
```

In [14]:
!zsh -c 'snpe-dlc-graph-prepare --input_dlc /models/yolo_hagRID_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_hagRID_int8_htp_sm7325.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM7325
[USER_INFO] Target device backend record identifier: HTP_V68_SM7325_2MB
[USER_INFO] No cache record in the DLC matches the target device (HTP_V68_SM7325_2MB). Creating a new record
[USER_INFO] Checking unsigned PD session
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0xfd750f0
[INFO] Found Interface Provider (v2.25)
[USER_INFO] Platform option not set
[USER_INFO] Created ctx=0x1 for Graph Id=0 backend=HTP SNPE Id=0xfd68728
[USER_INFO] Context [0x1] Setting priority to: default
[USER_INFO] Offline Prepare VTCM size(MB) selected = 0
[USER_INFO] Offline Prepare Optimization Level passed = 2

spill_bytes=4106240
fill_bytes=4106240
write_total_bytes=4433920
read_total_bytes=8704000

[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context handle=0x1 for Graph Id=0 backend=HTP SNPE 

### 6. Final Model Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_int8_htp_sm7325.dlc
```

In [15]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_int8_htp_sm7325.dlc'

DLC info of: /models/yolo_hagRID_int8_htp_sm7325.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; dump_inferred_model=False; input_dim=None; converter_op_package_lib=; expand_lstm_op_structure=False; adjust_nms_features_dims=True; define_symbol=None; squash_box_decoder=True; inject_cast_for_gather=True; match_caffe_ssd_to_tf=True; custom_io=; enable_match_topk=False; apply_masked_softmax=uncompressed; dumpIR=False; no_simplification=False; dump_custom_io_config_template=; dump_value_info=False; expand_gru_op_structure=True; input_layout=[]; align_matmul_ranks=True; preprocess_roi_pool_inputs=True; custom_op_config_paths=None; enable_framework_trace=False; batch=None; keep_int64_inputs=False; float_bias_bw=0; extract_color_transform=True; model_version=None; float_bias_bitwidth=0; float_bw=32; out_names=['output_bboxes', 'output_classes']; float_bitwidth=32; dry_

## Model Optimization (TensorFlow to TFLite)
This section demonstrates how to convert TensorFlow SavedModels to TensorFlow Lite (TFLite) format for deployment on mobile and edge devices. The process iterates over two models (`yolo_nas_s` and `yolo_hagRID`), loading each SavedModel and converting it to a TFLite model using only float32 operations for maximum compatibility. The converted models are then saved as `.tflite` files, ready for inference on devices that support TFLite. This approach ensures that the models maintain their original precision and are optimized for efficient execution on a wide range of hardware.

### Converting the model to Float32 (CPU, GPU and NNAPI)
The following code block iterates over both the YOLO-NAS S and YOLO-hagRID models, loading each TensorFlow SavedModel and converting it to the TFLite format using float32 precision. By restricting the conversion to float32 operations only, the resulting TFLite models are compatible with a wide range of hardware accelerators, including CPU, GPU, and NNAPI. Each converted model is saved as a `.tflite` file, making it ready for deployment and inference on supported devices. This process preserves the original model accuracy while enabling efficient execution on mobile and edge platforms.

In [16]:
for model in ["yolo_nas_s", "yolo_hagRID"]:

    # Step 1: Load the SavedModel.
    saved_model_dir = f"/models/{model}"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

    # Step 2: Restrict to float32 operations only (for max delegate
    # compatibility).
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]

    # Step 3: Convert the model.
    tflite_model = converter.convert()

    # Step 4: Save the model.
    output_path = f"/models/{model}_float32.tflite"
    with open(output_path, "wb") as f:
        f.write(tflite_model)

    print(f"✅ Float32 model saved to: {output_path}")

2025-05-31 21:23:16.759062: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/.miniconda3/envs/snpe/lib/python3.10/site-packages/cv2/../../lib64:::/root/.miniforge3//lib/:/qairt/2.34.0.250424/lib/x86_64-linux-clang
2025-05-31 21:23:16.759078: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2025-05-31 21:23:16.759087: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (hagrid_optimizer): /proc/driver/nvidia/version does not exist
2025-05-31 21:23:16.759265: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other oper

Estimated count of arithmetic ops: 7.993 G  ops, equivalently 3.996 G  MACs
✅ Float32 model saved to: /models/yolo_nas_s_float32.tflite


2025-05-31 21:23:16.978519: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2025-05-31 21:23:16.978535: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2025-05-31 21:23:16.978919: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/yolo_nas_s
2025-05-31 21:23:16.995852: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2025-05-31 21:23:16.995884: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/yolo_nas_s
2025-05-31 21:23:17.026621: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2025-05-31 21:23:17.027734: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2025-05-31 21:23:17.073081: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /models/yolo_nas_s
2

Estimated count of arithmetic ops: 7.523 G  ops, equivalently 3.762 G  MACs
✅ Float32 model saved to: /models/yolo_hagRID_float32.tflite


### Converting the model to Float16 (CPU and GPU)
This section demonstrates how to convert TensorFlow SavedModels to TensorFlow Lite (TFLite) format using float16 precision for deployment on mobile and edge devices. The process iterates over both the YOLO-NAS S and YOLO-hagRID models, loading each SavedModel and configuring the TFLiteConverter to:

- Enable default optimizations for improved performance and reduced model size.
- Target float16 as the precision for weights and activations, which reduces memory usage and speeds up inference on supported hardware (CPU and GPU).
- Restrict the converter to use only built-in TFLite operations for maximum compatibility.

The converted float16 models are saved as `.tflite` files, making them suitable for efficient inference on devices that support float16 execution. This approach balances model size, speed, and accuracy for deployment on a wide range of hardware.

In [17]:
for model in ["yolo_nas_s", "yolo_hagRID"]:

    # Step 1: Load the SavedModel.
    saved_model_dir = f"/models/{model}"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

    # Step 2: Enable optimization.
    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    # Step 3: Set float16 as the target precision.
    converter.target_spec.supported_types = [tf.float16]

    # Step 4: Use only float ops (TFLITE_BUILTINS).
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]

    # Step 5: Convert the model.
    tflite_model = converter.convert()

    # Step 6: Save the converted model.
    output_path = f"/models/{model}_float16.tflite"
    with open(output_path, "wb") as f:
        f.write(tflite_model)

    print(f"✅ Float16 model saved to: {output_path}")

Estimated count of arithmetic ops: 7.993 G  ops, equivalently 3.996 G  MACs
✅ Float16 model saved to: /models/yolo_nas_s_float16.tflite


2025-05-31 21:23:18.251328: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2025-05-31 21:23:18.251343: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2025-05-31 21:23:18.251429: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/yolo_nas_s
2025-05-31 21:23:18.267059: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2025-05-31 21:23:18.267086: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/yolo_nas_s
2025-05-31 21:23:18.288698: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2025-05-31 21:23:18.334262: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /models/yolo_nas_s
2025-05-31 21:23:18.355658: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 1

Estimated count of arithmetic ops: 7.523 G  ops, equivalently 3.762 G  MACs
✅ Float16 model saved to: /models/yolo_hagRID_float16.tflite


2025-05-31 21:23:18.911289: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2025-05-31 21:23:18.911317: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2025-05-31 21:23:18.911401: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/yolo_hagRID
2025-05-31 21:23:18.916548: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2025-05-31 21:23:18.916566: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/yolo_hagRID
2025-05-31 21:23:18.921605: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2025-05-31 21:23:18.943368: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /models/yolo_hagRID
2025-05-31 21:23:18.959917: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Too

### Converting the model to Int8 (CPU and NNAPI)
This section demonstrates how to convert TensorFlow SavedModels to fully quantized Int8 TFLite models for deployment on mobile and edge devices. The process involves:

- Loading each SavedModel for the YOLO-NAS S and YOLO-hagRID models.
- Enabling default optimizations to allow quantization.
- Defining a representative dataset generator, which provides sample input data to calibrate the quantization process and ensure accuracy.
- Restricting the converter to use only built-in TFLite operations, including Int8 ops, for maximum compatibility and performance.
- Converting the model and saving the quantized Int8 TFLite file.

This approach produces highly efficient models suitable for real-time inference on devices that support TFLite Int8 execution, leveraging hardware acceleration where available.

In [None]:
for model, size in zip(["yolo_nas_s", "yolo_hagRID"], [320, 640]):

    # Step 1: Load the SavedModel.
    saved_model_dir = f"/models/{model}"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

    # Step 2: Enable optimizations.
    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    # Step 3: Define representative dataset generator.
    # Replace (1, 320, 320, 3) with your model's input shape if different.
    def representative_data_gen():
        for _ in range(100):
            dummy_input = np.random.rand(1, size, size, 3).astype(np.float32)
            yield [dummy_input]

    converter.representative_dataset = representative_data_gen

    # Step 4: Set supported operations and data types for full Int8.
    converter.target_spec.supported_ops = [
        tf.lite.OpsSet.TFLITE_BUILTINS,      # Float32
        tf.lite.OpsSet.TFLITE_BUILTINS_INT8  # Allow fallback ops
    ]

    # Step 5: Convert the model.
    tflite_model = converter.convert()

    # Step 6: Save the optimized model to a file
    output_path = f"/models/{model}_int8.tflite"
    with open(output_path, "wb") as f:
        f.write(tflite_model)

    print(f"✅ Int8 model saved to: {output_path}")

2025-05-31 21:23:19.639773: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2025-05-31 21:23:19.639791: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2025-05-31 21:23:19.639882: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/yolo_nas_s
2025-05-31 21:23:19.656665: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2025-05-31 21:23:19.656691: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/yolo_nas_s
2025-05-31 21:23:19.669965: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2025-05-31 21:23:19.698320: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /models/yolo_nas_s
2025-05-31 21:23:19.721041: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 8

Estimated count of arithmetic ops: 7.993 G  ops, equivalently 3.996 G  MACs


fully_quantize: 0, inference_type: 6, input_inference_type: FLOAT32, output_inference_type: FLOAT32
2025-05-31 21:23:37.385359: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1989] Estimated count of arithmetic ops: 7.993 G  ops, equivalently 3.996 G  MACs



Estimated count of arithmetic ops: 7.993 G  ops, equivalently 3.996 G  MACs
✅ Int8 model saved to: /models/yolo_nas_s_int8.tflite


2025-05-31 21:23:37.646554: E tensorflow/core/framework/node_def_util.cc:675] NodeDef mentions attribute debug_name which is not in the op definition: Op<name=VarHandleOp; signature= -> resource:resource; attr=container:string,default=""; attr=shared_name:string,default=""; attr=dtype:type; attr=shape:shape; attr=allowed_devices:list(string),default=[]; is_stateful=true> This may be expected if your graph generating binary is newer  than this binary. Unknown attributes will be ignored. NodeDef: {{node wa/model.10/m/m.0/attn/pe/conv/Conv/kernel}}
2025-05-31 21:23:37.668838: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2025-05-31 21:23:37.668849: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2025-05-31 21:23:37.668936: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/yolo_hagRID
2025-05-31 21:23:37.674026: I tensorflow/cc/saved_model/reader.cc:89] Readi

Estimated count of arithmetic ops: 7.523 G  ops, equivalently 3.762 G  MACs


By following these steps, the model is optimized for efficient inference on Qualcomm® platforms, leveraging hardware acceleration for real-time AI applications. This process ensures that the model is both accurate and performant, making it suitable for deployment in edge devices powered by Qualcomm® chipsets.