# Model Optimization Notebook

This Jupyter Notebook demonstrates the process of preparing image data, downloading the COCO dataset, preprocessing images, and converting a YOLO NAS ONNX model to various SNPE DLC formats for deployment. The workflow includes:

- Importing necessary libraries for image processing and file management.
- Downloading and extracting a subset of the COCO validation dataset.
- Preprocessing images to the required input format for model inference.
- Converting the YOLO NAS ONNX model to SNPE DLC format, including quantization and graph preparation for specific hardware targets.
- Documenting each step for reproducibility and clarity.

This notebook serves as a practical guide for deploying deep learning models on Qualcomm platforms using the SNPE toolkit.

## How to Use

1. **Build and start the Docker Compose environment** as described in the project documentation.
2. **Access this notebook** in your browser at:  
    [http://127.0.0.1:8888/notebooks/model_optimization.ipynb](http://127.0.0.1:8888/notebooks/model_optimization.ipynb)
3. **Run all cells** in order to optimiza the ONNX model to Qualcomm chipsets.

In [1]:
# Import necessary libraries.
import glob
import os
import random
import uuid
import torch

import cv2 as cv
import numpy as np

## Data cleaning.

The `preprocess` function resizes an input image to 320x320 pixels, normalizes its pixel values to the range [0, 1], and returns the processed image as a NumPy array of type float32, preparing it for model inference.

In [2]:
def preprocess(original_image: np.ndarray, size: int = 320) -> np.ndarray:
    """
    Preprocess the input image for model inference.

    Args:
        original_image (np.ndarray): The input image in BGR format.
        size (int): The target size for resizing the image. Default is 320.

    Returns:
        np.ndarray: The preprocessed image in the format expected by the model.
    """

    # Resize the image to (size X size) pixels and normalize pixel values to
    # [0, 1].
    resized_image = cv.resize(original_image, (size, size))
    return (resized_image / 255.).astype(np.float32)

### Getting the coco dataset
The COCO (Common Objects in Context) dataset is a large-scale image dataset designed for object detection, segmentation, and captioning tasks. In this pipeline, we use a subset of the COCO validation images to test and optimize our deep learning model. The images are downloaded, preprocessed, and converted into a raw format suitable for model inference and quantization steps. This ensures that the model is evaluated and optimized using real-world, diverse data representative of common objects and scenes.

In [3]:
if not os.path.exists("val2017.zip"):
    !wget http://images.cocodataset.org/zips/val2017.zip -q --show-progress

if not os.path.exists("val2017"):
    !unzip val2017.zip

if not os.path.exists("val.zip"):
    !wget https://huggingface.co/datasets/testdummyvt/hagRIDv2_512px_10GB/resolve/main/val.zip -q --show-progress

if not os.path.exists("val"):
    !unzip val.zip

if not os.path.exists("raw"):
    !mkdir "raw"
    !mkdir "raw/320"
    !mkdir "raw/640"

    random.seed(42)
    for size, folder in zip([320, 640], ["val2017", "val"]):
        filenames = glob.glob(f"{folder}/**/*.jpg", recursive=True)
        random.shuffle(filenames)

        for filename in filenames[:100]:
            image = cv.imread(filename)

            # Preprocess the image and save it in raw format.
            normalized_image = preprocess(image, size=size)
            normalized_image.tofile(f"raw/{size}/{uuid.uuid4()}.raw")

!zsh -c 'find raw/320 -name "*.raw" > ./raw/320/input.txt'
!zsh -c 'find raw/640 -name "*.raw" > ./raw/640/input.txt'

Archive:  val.zip
   creating: val/
  inflating: __MACOSX/._val          
   creating: val/three_gun/
  inflating: __MACOSX/val/._three_gun  
   creating: val/hand_heart2/
  inflating: __MACOSX/val/._hand_heart2  
   creating: val/dislike/
  inflating: __MACOSX/val/._dislike  
   creating: val/like/
  inflating: __MACOSX/val/._like     
   creating: val/hand_heart/
  inflating: __MACOSX/val/._hand_heart  
  inflating: val/.DS_Store           
  inflating: __MACOSX/val/._.DS_Store  
   creating: val/thumb_index2/
  inflating: __MACOSX/val/._thumb_index2  
   creating: val/stop_inverted/
  inflating: __MACOSX/val/._stop_inverted  
   creating: val/ok/
  inflating: __MACOSX/val/._ok       
   creating: val/mute/
  inflating: __MACOSX/val/._mute     
   creating: val/point/
  inflating: __MACOSX/val/._point    
   creating: val/two_up/
  inflating: __MACOSX/val/._two_up   
   creating: val/timeout/
  inflating: __MACOSX/val/._timeout  
   creating: val/take_picture/
  inflating: __MACOSX/v

## Model Optimization (YOLO-NAS S)
This section covers the process of optimizing a deep learning model for deployment on Qualcomm® chipsets using the Qualcomm® Neural Processing SDK for AI (SNPE). The workflow includes converting a YOLO NAS ONNX model to the SNPE DLC format, quantizing the model for efficient inference, and preparing the model for specific hardware targets.

### 1. Model Conversion

The first step is to convert the ONNX model to the SNPE Deep Learning Container (DLC) format. This is achieved using the `snpe-onnx-to-dlc` tool, which translates the ONNX model into a format compatible with Qualcomm® hardware accelerators.

**Command:**
```
snpe-onnx-to-dlc -i /models/yolo_nas_s.onnx -o /models/yolo_nas_s_fp32.dlc
```

In [4]:
!zsh -c 'snpe-onnx-to-dlc -i /models/yolo_nas_s.onnx -o /models/yolo_nas_s_fp32.dlc'

2025-05-31 17:41:20,591 - 270 - INFO - Input shape info 
2025-05-31 17:41:23,218 - 270 - INFO - INFO_INITIALIZATION_SUCCESS: 
2025-05-31 17:41:23,880 - 270 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2025-05-31 17:41:23,897 - 270 - INFO - INFO_WRITE_SUCCESS: 


### 2. Model Inspection

After conversion, the `snpe-dlc-info` tool is used to inspect the DLC file. This step ensures that the model has been correctly converted and provides information about the model's input and output tensors.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_fp32.dlc
```

In [5]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_fp32.dlc'

DLC info of: /models/yolo_nas_s_fp32.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; unroll_gru_time_steps=True; quantization_overrides=; prepare_inputs_as_params=False; perform_axes_to_spatial_first_order=True; unroll_lstm_time_steps=True; packed_max_seq=1; validation_target=[]; packed_masked_softmax_inputs=[]; preserve_io=[]; debug=-1; dump_qairt_io_config_yaml=; multi_time_steps_lstm=False; multi_time_steps_gru=False; preserve_onnx_output_order=False; perform_layout_transformation=False; keep_quant_nodes=False; input_dtype=[]; keep_disconnected_nodes=False; input_encoding=[]; handle_gather_negative_indices=True; copyright_file=None; force_prune_cast_ops=False; package_name=None; use_convert_quantization_nodes=False; defer_loading=False; dry_run=None; input_type=[]; float_bitwidth=32; out_names=['output_bboxes', 'output_classes']; float_bw=32; model_version=N

### 3. Model Quantization

Quantization reduces the model size and increases inference speed by converting floating-point weights to 8-bit integers. The `snpe-dlc-quantize` tool uses a calibration dataset (prepared in the previous steps) to optimize the model for INT8 precision.

**Command:**
```
snpe-dlc-quantize --input_dlc /models/yolo_nas_s_fp32.dlc --input_list ./raw/input.txt --output_dlc /models/yolo_nas_s_int8.dlc
```

In [6]:
!zsh -c 'snpe-dlc-quantize --input_dlc /models/yolo_nas_s_fp32.dlc --input_list ./raw/320/input.txt --output_dlc /models/yolo_nas_s_int8.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] Processed command-line arguments
     1.9ms [  INFO ] Inferences will run in sync mode
[INFO] Quantized parameters
Processing inference input(s):
raw/320/e879cc9f-8c45-4728-a4d8-44a2ec37a502.raw
raw/320/c1b61ba7-26ea-4a4e-8314-aebfd8e10e32.raw
raw/320/fbf9d7ae-cae2-4552-b3af-a36b7ebcc7b4.raw
raw/320/7bc0f554-ad68-41d4-9cef-6667d2550bbc.raw
raw/320/f594fa6a-cf94-423c-bc40-bd2890dde999.raw
raw/320/f5399ea6-3fe7-4c3d-9b39-bbbfb867f681.raw
raw/320/5113b6fa-7b30-4a46-a9b3-3cc27d24021a.raw
raw/320/a987f2a9-ad95-41b6-95b6-538eea4ce834.raw
raw/320/0508edae-0014-4ed2-9e42-bd62d2dbf4cf.raw
raw/320/f71453a3-769c-4c22-8fda-1963af1e926a.raw
raw/320/94a62290-8642-4e6d-b980-25951c35a074.raw
raw/320/177362a0-83d1-4cf2-8307-0e6691049316.raw
raw/320/c30cfd34-8c86-44b0-8e62-4257d996924b.raw
raw/320/d9db7546-8112-4625-964c-d9db4b4e6bc5.raw
raw/320/73af4a3f-1b37-4ce2-827a-7b915673712c.raw
raw/320/b9b25790-f694-42a2-97ac-ada367e8362a.raw
raw/320/9cc813e4

### 4. Post-Quantization Inspection

After quantization, the model is inspected again to verify the changes and ensure the quantized model is ready for deployment.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_int8.dlc
```

In [7]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_int8.dlc'

DLC info of: /models/yolo_nas_s_int8.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; dump_inferred_model=False; input_dim=None; converter_op_package_lib=; expand_lstm_op_structure=False; adjust_nms_features_dims=True; define_symbol=None; squash_box_decoder=True; inject_cast_for_gather=True; match_caffe_ssd_to_tf=True; custom_io=; enable_match_topk=False; apply_masked_softmax=uncompressed; dumpIR=False; no_simplification=False; dump_custom_io_config_template=; dump_value_info=False; expand_gru_op_structure=True; input_layout=[]; align_matmul_ranks=True; preprocess_roi_pool_inputs=True; custom_op_config_paths=None; enable_framework_trace=False; batch=None; keep_int64_inputs=False; float_bias_bw=0; extract_color_transform=True; model_version=None; float_bias_bitwidth=0; float_bw=32; out_names=['output_bboxes', 'output_classes']; float_bitwidth=32; dry_run=None; us

### 5. Hardware-Specific Graph Preparation

To further optimize the model for a specific Qualcomm® SoC (e.g., SM7325), the `snpe-dlc-graph-prepare` tool is used. This step configures the model's output tensors and prepares it for execution on the target hardware's HTP (Hexagon Tensor Processor).

**Command:**
```
snpe-dlc-graph-prepare --input_dlc /models/yolo_nas_s_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_nas_s_int8_htp_sm7325.dlc
```

In [8]:
!zsh -c 'snpe-dlc-graph-prepare --input_dlc /models/yolo_nas_s_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_nas_s_int8_htp_sm7325.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM7325
[USER_INFO] Target device backend record identifier: HTP_V68_SM7325_2MB
[USER_INFO] No cache record in the DLC matches the target device (HTP_V68_SM7325_2MB). Creating a new record
[USER_INFO] Checking unsigned PD session
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x17618bc0
[INFO] Found Interface Provider (v2.25)
[USER_INFO] Platform option not set
[USER_INFO] Created ctx=0x1 for Graph Id=0 backend=HTP SNPE Id=0x17480728
[USER_INFO] Context [0x1] Setting priority to: default
[USER_INFO] Offline Prepare VTCM size(MB) selected = 0
[USER_INFO] Offline Prepare Optimization Level passed = 2

spill_bytes=2191360
fill_bytes=2191360
write_total_bytes=2371584
read_total_bytes=18309120

[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context handle=0x1 for Graph Id=0 backend=HTP SN

### 6. Final Model Inspection

A final inspection confirms that the model is correctly prepared for the target hardware and ready for deployment.

**Command:**
```
snpe-dlc-info -i /models/yolo_nas_s_int8_htp_sm7325.dlc
```

In [9]:
!zsh -c 'snpe-dlc-info -i /models/yolo_nas_s_int8_htp_sm7325.dlc'

DLC info of: /models/yolo_nas_s_int8_htp_sm7325.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; dump_inferred_model=False; input_dim=None; converter_op_package_lib=; expand_lstm_op_structure=False; adjust_nms_features_dims=True; define_symbol=None; squash_box_decoder=True; inject_cast_for_gather=True; match_caffe_ssd_to_tf=True; custom_io=; enable_match_topk=False; apply_masked_softmax=uncompressed; dumpIR=False; no_simplification=False; dump_custom_io_config_template=; dump_value_info=False; expand_gru_op_structure=True; input_layout=[]; align_matmul_ranks=True; preprocess_roi_pool_inputs=True; custom_op_config_paths=None; enable_framework_trace=False; batch=None; keep_int64_inputs=False; float_bias_bw=0; extract_color_transform=True; model_version=None; float_bias_bitwidth=0; float_bw=32; out_names=['output_bboxes', 'output_classes']; float_bitwidth=32; dry_r

## Model Optimization (YOLO-hagRID)
This section details the optimization workflow for the YOLO-hagRID model, following the same steps as with YOLO-NAS S. The process includes converting the ONNX model to SNPE DLC format, quantizing for efficient inference, and preparing the model for deployment on Qualcomm® hardware. Each step ensures the model is tailored for high performance and low latency on edge devices.

### 1. Model Conversion

**Command:**
```
snpe-onnx-to-dlc -i /models/yolo_hagRID.onnx -o /models/yolo_hagRID_fp32.dlc
```

In [10]:
!zsh -c 'snpe-onnx-to-dlc -i /models/yolo_hagRID.onnx -o /models/yolo_hagRID_fp32.dlc'

2025-05-31 17:42:04,221 - 270 - INFO - Input shape info 
2025-05-31 17:42:06,328 - 270 - INFO - INFO_INITIALIZATION_SUCCESS: 
2025-05-31 17:42:06,500 - 270 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2025-05-31 17:42:06,506 - 270 - INFO - INFO_WRITE_SUCCESS: 


### 2. Model Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_fp32.dlc
```

In [11]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_fp32.dlc'

DLC info of: /models/yolo_hagRID_fp32.dlc
                                                                                                    
Model Version: 
Model Copyright:
Converter command: snpe-onnx-to-dlc; unroll_gru_time_steps=True; quantization_overrides=; prepare_inputs_as_params=False; perform_axes_to_spatial_first_order=True; unroll_lstm_time_steps=True; packed_max_seq=1; validation_target=[]; packed_masked_softmax_inputs=[]; preserve_io=[]; debug=-1; dump_qairt_io_config_yaml=; multi_time_steps_lstm=False; multi_time_steps_gru=False; preserve_onnx_output_order=False; perform_layout_transformation=False; keep_quant_nodes=False; input_dtype=[]; keep_disconnected_nodes=False; input_encoding=[]; handle_gather_negative_indices=True; copyright_file=None; force_prune_cast_ops=False; package_name=None; use_convert_quantization_nodes=False; defer_loading=False; dry_run=None; input_type=[]; float_bitwidth=32; out_names=['output_bboxes', 'output_classes']; float_bw=32; model_version=

### 3. Model Quantization

**Command:**
```
snpe-dlc-quantize --input_dlc /models/yolo_hagRID_fp32.dlc --input_list ./raw/640/input.txt --output_dlc /models/yolo_hagRID_int8.dlc
```

In [None]:
!zsh -c 'snpe-dlc-quantize --input_dlc /models/yolo_hagRID_fp32.dlc --input_list ./raw/640/input.txt --output_dlc /models/yolo_hagRID_int8.dlc'

[INFO] InitializeStderr: DebugLog initialized.
[INFO] Processed command-line arguments
     2.1ms [  INFO ] Inferences will run in sync mode
Processing inference input(s):
raw/640/89b8b13e-737a-4fd5-a27d-230d21e3a9c1.raw
raw/640/91a88433-0fd6-4477-9394-ef7267f6ab03.raw
raw/640/84587c23-f005-4ab8-a8c0-c4e60c2e23c3.raw
raw/640/bfb12cff-df97-4c35-8998-87286727aa75.raw
raw/640/d4b2f903-52e9-4983-9458-5361445db2b6.raw
raw/640/232f315e-5910-43a0-b1a5-62d188b17118.raw
raw/640/546eb02e-7d9e-42b9-ad40-c842bd1a8420.raw
raw/640/d48028af-4159-40cb-aa0c-05bf452f6284.raw
raw/640/2d0da6a0-c456-4fdc-b2e2-53717e2c5dbe.raw
raw/640/b61fb21d-720d-4cb4-bf3a-3ac29b614da7.raw
raw/640/29d2d155-0009-41bc-98bc-ed141cbc5f44.raw
raw/640/aff20315-462d-436b-abe4-0cc98e44d062.raw
raw/640/8a455689-f954-4a38-9ebb-b295ce3a0fc8.raw
raw/640/b9d42ff3-1bad-4a4e-ac7c-18cb1a91a8eb.raw
raw/640/cc9b63b9-4eb9-4d61-9279-2e2c31f24902.raw
raw/640/8f3ab39e-94ee-4688-9df7-a0bdfaeeff6e.raw
raw/640/c9282b54-5780-4613-94ba-298e5810cf10

### 4. Post-Quantization Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_int8.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_int8.dlc'

### 5. Hardware-Specific Graph Preparation

**Command:**
```
snpe-dlc-graph-prepare --input_dlc /models/yolo_hagRID_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_hagRID_int8_htp_sm7325.dlc
```

In [None]:
!zsh -c 'snpe-dlc-graph-prepare --input_dlc /models/yolo_hagRID_int8.dlc --set_output_tensors=output_bboxes,output_classes --htp_socs=sm7325 --output_dlc=/models/yolo_hagRID_int8_htp_sm7325.dlc'

### 6. Final Model Inspection

**Command:**
```
snpe-dlc-info -i /models/yolo_hagRID_int8_htp_sm7325.dlc
```

In [None]:
!zsh -c 'snpe-dlc-info -i /models/yolo_hagRID_int8_htp_sm7325.dlc'

By following these steps, the model is optimized for efficient inference on Qualcomm® platforms, leveraging hardware acceleration for real-time AI applications. This process ensures that the model is both accurate and performant, making it suitable for deployment in edge devices powered by Qualcomm® chipsets.