# Model Optimization Notebook

This Jupyter Notebook demonstrates the process of preparing image data, downloading the COCO dataset, preprocessing images, and converting a YOLO-NAS S ONNX model to various SNPE DLC formats for deployment. The workflow includes:

- Importing necessary libraries for image processing and file management.
- Downloading and extracting a subset of the COCO validation dataset.
- Preprocessing images to the required input format for model inference.
- Converting the YOLO-NAS S ONNX model to SNPE DLC format, including quantization and graph preparation for specific hardware targets.
- Documenting each step for reproducibility and clarity.

This notebook serves as a practical guide for deploying deep learning models on Qualcomm platforms using the SNPE toolkit.

## How to Use

1. **Build and start the Docker Compose environment** as described in the project documentation.
2. **Access this notebook** in your browser at:  
    [http://127.0.0.1:8888/notebooks/model_optimization.ipynb](http://127.0.0.1:8888/notebooks/model_optimization.ipynb)
3. **Run all cells** in order to optimiza the ONNX model to Qualcomm chipsets.

In [None]:
# Import necessary libraries.
import glob
import os
import random
import uuid

import cv2 as cv
import numpy as np
import tensorflow as tf

## Data cleaning.

The `preprocess` function resizes an input image to 320x320 pixels, normalizes its pixel values to the range [0, 1], and returns the processed image as a NumPy array of type float32, preparing it for model inference.

In [None]:
def preprocess(original_image: np.ndarray, size: int = 320) -> np.ndarray:
    """
    Preprocess the input image for model inference.

    Args:
        original_image (np.ndarray): The input image in BGR format.
        size (int): The target size for resizing the image. Default is 320.

    Returns:
        np.ndarray: The preprocessed image in the format expected by the model.
    """

    # Resize the image to (size X size) pixels and normalize pixel values to
    # [0, 1].
    resized_image = cv.resize(original_image, (size, size))
    return (resized_image / 255.).astype(np.float32)

### Getting the coco dataset
The COCO (Common Objects in Context) dataset is a large-scale image dataset designed for object detection, segmentation, and captioning tasks. In this pipeline, we use a subset of the COCO validation images to test and optimize our deep learning model. The images are downloaded, preprocessed, and converted into a raw format suitable for model inference and quantization steps. This ensures that the model is evaluated and optimized using real-world, diverse data representative of common objects and scenes.

In [None]:
if not os.path.exists("val2017.zip"):
    !zsh -c 'wget http://images.cocodataset.org/zips/val2017.zip -q --show-progress'

if not os.path.exists("val2017"):
    !zsh -c 'unzip val2017.zip'

if not os.path.exists("val.zip"):
    !zsh -c 'wget https://huggingface.co/datasets/testdummyvt/hagRIDv2_512px_10GB/resolve/main/val.zip -q --show-progress'

if not os.path.exists("val"):
    !zsh -c 'unzip val.zip'

if not os.path.exists("raw"):
    !zsh -c 'mkdir raw'
    !zsh -c 'mkdir raw/coco'
    !zsh -c 'mkdir raw/hagRID'

    random.seed(42)
    for dataset, folder in zip(["coco", "hagRID"], ["val2017", "val"]):
        filenames = glob.glob(f"{folder}/**/*.jpg", recursive=True)
        random.shuffle(filenames)

        for filename in filenames[:100]:
            image = cv.imread(filename)

            # Preprocess the image and save it in raw format.
            normalized_image = preprocess(image)
            normalized_image.tofile(f"raw/{dataset}/{uuid.uuid4()}.raw")

!zsh -c 'rm -rf __MACOSX'
!zsh -c 'find raw/coco -name "*.raw" > ./raw/coco/input.txt'
!zsh -c 'find raw/hagRID -name "*.raw" > ./raw/hagRID/input.txt'

## Model Optimization (YOLO-NAS S)
This section covers the process of optimizing a deep learning model for deployment on Qualcomm® chipsets using the Qualcomm® Neural Processing SDK for AI (SNPE). The workflow includes converting a YOLO NAS ONNX model to the SNPE DLC format, quantizing the model for efficient inference, and preparing the model for specific hardware targets.

### 1. Model Conversion

The first step is to convert the ONNX model to the SNPE Deep Learning Container (DLC) format. This is achieved using the `snpe-onnx-to-dlc` tool, which translates the ONNX model into a format compatible with Qualcomm® hardware accelerators.

**Command:**
```
snpe-onnx-to-dlc -i /models/yolo_nas_s.onnx -o /models/yolo_nas_s_fp32.dlc
```

In [None]:
!zsh -c 'snpe-onnx-to-dlc -i /models/yolo_nas_s.onnx -o /models/yolo_nas_s_fp32.dlc'
!zsh -c 'rm -rf -p output'

### 2. Model Inspection

After conversion, the `snpe-dlc-info` tool is used to inspect the DLC file. This step ensures that the model has been correctly converted and provides information about the model's input and output tensors.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_fp32.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_fp32.dlc'

### 3. Model Quantization

Quantization reduces the model size and increases inference speed by converting floating-point weights to 8-bit integers. The `snpe-dlc-quantize` tool uses a calibration dataset (prepared in the previous steps) to optimize the model for INT8 precision.

**Command:**
```
snpe-dlc-quantize --input_dlc /models/yolo_nas_s_fp32.dlc --input_list ./raw/coco/input.txt --output_dlc /models/yolo_nas_s_int8.dlc
```

In [None]:
!zsh -c 'snpe-dlc-quantize --input_dlc /models/yolo_nas_s_fp32.dlc --input_list ./raw/coco/input.txt --output_dlc /models/yolo_nas_s_int8.dlc'

### 4. Post-Quantization Inspection

After quantization, the model is inspected again to verify the changes and ensure the quantized model is ready for deployment.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_int8.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_int8.dlc'

### 5. Hardware-Specific Graph Preparation

To further optimize the model for a specific Qualcomm® SoC (e.g., SM7325), the `snpe-dlc-graph-prepare` tool is used. This step configures the model's output tensors and prepares it for execution on the target hardware's HTP (Hexagon Tensor Processor).

**Command:**
```
snpe-dlc-graph-prepare --input_dlc /models/yolo_nas_s_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_nas_s_int8_htp_sm7325.dlc
```

In [None]:
!zsh -c 'snpe-dlc-graph-prepare --input_dlc /models/yolo_nas_s_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_nas_s_int8_htp_sm7325.dlc'

### 6. Final Model Inspection

A final inspection confirms that the model is correctly prepared for the target hardware and ready for deployment.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_int8_htp_sm7325.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_int8_htp_sm7325.dlc'

## Model Optimization (YOLO-hagRID)
This section details the optimization workflow for the YOLO-hagRID model, following the same steps as with YOLO-NAS S. The process includes converting the ONNX model to SNPE DLC format, quantizing for efficient inference, and preparing the model for deployment on Qualcomm® hardware. Each step ensures the model is tailored for high performance and low latency on edge devices.

### 1. Model Conversion

**Command:**
```
snpe-onnx-to-dlc -i /models/yolo_hagRID.onnx -o /models/yolo_hagRID_fp32.dlc
```

In [None]:
!zsh -c 'snpe-onnx-to-dlc -i /models/yolo_hagRID.onnx -o /models/yolo_hagRID_fp32.dlc'

### 2. Model Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_fp32.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_fp32.dlc'

### 3. Model Quantization

**Command:**
```
snpe-dlc-quantize --input_dlc /models/yolo_hagRID_fp32.dlc --input_list ./raw/hagRID/input.txt --output_dlc /models/yolo_hagRID_int8.dlc
```

In [None]:
!zsh -c 'snpe-dlc-quantize --input_dlc /models/yolo_hagRID_fp32.dlc --input_list ./raw/hagRID/input.txt --output_dlc /models/yolo_hagRID_int8.dlc'

### 4. Post-Quantization Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_int8.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_int8.dlc'

### 5. Hardware-Specific Graph Preparation

**Command:**
```
snpe-dlc-graph-prepare --input_dlc /models/yolo_hagRID_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_hagRID_int8_htp_sm7325.dlc
```

In [None]:
!zsh -c 'snpe-dlc-graph-prepare --input_dlc /models/yolo_hagRID_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_hagRID_int8_htp_sm7325.dlc'

### 6. Final Model Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_int8_htp_sm7325.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_int8_htp_sm7325.dlc'

## Model Optimization (TensorFlow to TFLite)
This section demonstrates how to convert TensorFlow SavedModels to TensorFlow Lite (TFLite) format for deployment on mobile and edge devices. The process iterates over two models (`yolo_nas_s` and `yolo_hagRID`), loading each SavedModel and converting it to a TFLite model using only float32 operations for maximum compatibility. The converted models are then saved as `.tflite` files, ready for inference on devices that support TFLite. This approach ensures that the models maintain their original precision and are optimized for efficient execution on a wide range of hardware.

### Converting the model to Float32 (CPU, GPU and NNAPI)
The following code block iterates over both the YOLO-NAS S and YOLO-hagRID models, loading each TensorFlow SavedModel and converting it to the TFLite format using float32 precision. By restricting the conversion to float32 operations only, the resulting TFLite models are compatible with a wide range of hardware accelerators, including CPU, GPU, and NNAPI. Each converted model is saved as a `.tflite` file, making it ready for deployment and inference on supported devices. This process preserves the original model accuracy while enabling efficient execution on mobile and edge platforms.

In [None]:
for model in ["yolo_nas_s", "yolo_hagRID"]:

    # Step 1: Load the SavedModel.
    saved_model_dir = f"/models/{model}"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

    # Step 2: Restrict to float32 operations only (for max delegate
    # compatibility).
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]

    # Step 3: Convert the model.
    tflite_model = converter.convert()

    # Step 4: Save the model.
    output_path = f"/models/{model}_float32.tflite"
    with open(output_path, "wb") as f:
        f.write(tflite_model)

    print(f"✅ Float32 model saved to: {output_path}")

### Converting the model to Float16 (CPU and GPU)
This section demonstrates how to convert TensorFlow SavedModels to TensorFlow Lite (TFLite) format using float16 precision for deployment on mobile and edge devices. The process iterates over both the YOLO-NAS S and YOLO-hagRID models, loading each SavedModel and configuring the TFLiteConverter to:

- Enable default optimizations for improved performance and reduced model size.
- Target float16 as the precision for weights and activations, which reduces memory usage and speeds up inference on supported hardware (CPU and GPU).
- Restrict the converter to use only built-in TFLite operations for maximum compatibility.

The converted float16 models are saved as `.tflite` files, making them suitable for efficient inference on devices that support float16 execution. This approach balances model size, speed, and accuracy for deployment on a wide range of hardware.

In [None]:
for model in ["yolo_nas_s", "yolo_hagRID"]:

    # Step 1: Load the SavedModel.
    saved_model_dir = f"/models/{model}"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

    # Step 2: Enable optimization.
    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    # Step 3: Set float16 as the target precision.
    converter.target_spec.supported_types = [tf.float16]

    # Step 4: Use only float ops (TFLITE_BUILTINS).
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]

    # Step 5: Convert the model.
    tflite_model = converter.convert()

    # Step 6: Save the converted model.
    output_path = f"/models/{model}_float16.tflite"
    with open(output_path, "wb") as f:
        f.write(tflite_model)

    print(f"✅ Float16 model saved to: {output_path}")

### Converting the model to Int8 (CPU and NNAPI)
This section demonstrates how to convert TensorFlow SavedModels to fully quantized Int8 TFLite models for deployment on mobile and edge devices. The process involves:

- Loading each SavedModel for the YOLO-NAS S and YOLO-hagRID models.
- Enabling default optimizations to allow quantization.
- Defining a representative dataset generator, which provides sample input data to calibrate the quantization process and ensure accuracy.
- Restricting the converter to use only built-in TFLite operations, including Int8 ops, for maximum compatibility and performance.
- Converting the model and saving the quantized Int8 TFLite file.

This approach produces highly efficient models suitable for real-time inference on devices that support TFLite Int8 execution, leveraging hardware acceleration where available.

In [None]:
for model in ["yolo_nas_s", "yolo_hagRID"]:

    # Step 1: Load the SavedModel.
    saved_model_dir = f"/models/{model}"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

    # Step 2: Enable optimizations.
    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    # Step 3: Define representative dataset generator.
    # Replace (1, 320, 320, 3) with your model's input shape if different.
    def representative_data_gen():
        for _ in range(100):
            dummy_input = np.random.rand(1, 320, 320, 3).astype(np.float32)
            yield [dummy_input]

    converter.representative_dataset = representative_data_gen

    # Step 4: Set supported operations and data types for full Int8.
    converter.target_spec.supported_ops = [
        tf.lite.OpsSet.TFLITE_BUILTINS,      # Float32
        tf.lite.OpsSet.TFLITE_BUILTINS_INT8  # Allow fallback ops
    ]

    # Step 5: Convert the model.
    tflite_model = converter.convert()

    # Step 6: Save the optimized model to a file
    output_path = f"/models/{model}_int8.tflite"
    with open(output_path, "wb") as f:
        f.write(tflite_model)

    print(f"✅ Int8 model saved to: {output_path}")

By following these steps, the model is optimized for efficient inference on Qualcomm® platforms, leveraging hardware acceleration for real-time AI applications. This process ensures that the model is both accurate and performant, making it suitable for deployment in edge devices powered by Qualcomm® chipsets.