<a href="https://colab.research.google.com/github/marcory-hub/hailo-colab/blob/main/onnx_har_hef.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# From onnx via har to hef

Goal of this notebook is to make a `HEF` file is the file that runs on the `hailo-8l` device that is on the AI-kit. For a schematic overview and more details check the hailo docs about the [model build process](https://hailo.ai/developer-zone/documentation/dataflow-compiler-v3-29-0/?sp_referrer=overview/overview.html).

Credits to trieut415! A lot of the code from hailo is adjusted inspired on his [post](https://community.hailo.ai/t/guide-to-using-the-dfc-to-convert-a-modified-yolov11-on-google-colab/7131/3) in the hailo community. Especially the solution to run the Dataflow Compiler in a virtual environment solved my initial problem. Furthermore, the codeblock to make the calibration data is more robust.

## Overview
### 1. Parsing from onnx to har:
- Input: onnx file
- Output: har file (model representation and parameters (32-bits weights))

### 2. Model Optimization:
- input: har file (32-bits) and calibration images
- output: har file (optimized model representation and parameters (quantized weights))

  Conversion of the har file with float32 parameters to integers. To convert te parameters tun the model emulation in native mode on a small set of images (not annotated).

  #### Substeps

  1. Prepare callibration set
  2. Load har file (32-bits) from model conversion
  3. Create model script


### 3. Model Compilation:
- input: har (optimized)
- output: hef

  The quantized model is compiled into a specific binary format called HEF (Hailo Executable Format). This format is optimized for the Hailo device's architecture and allows for efficient execution of the model's operations.

## Before your start
1. Download hailo dataflow compiler  from https://hailo.ai/developer-zone/software-downloads/ (you need to make an account) and upload it to your Google Drive. To check the python version of Colab you can run the command below.
2. Collect a set of 1024 images needed for callibration. These images need no annotation, but should be representatieve. Zip it preferably with the name calibrationDataset.zip (On mac use `ditto -c -k --norsrc --keepParent calibrationDataset calibrationDataset.zip`)
3. Spin up a Colab with GPU, needed for the optimization step.

In [None]:
!python --version

## Install Dataflow Compiler (DFC) in virtual environment (venv)

In [None]:
# Mount google drive
from google.colab import drive

drive.mount('/content/gdrive')

In [None]:
# Make virual environment

# Update and install packages needed for DFC
!sudo apt-get update
!sudo apt-get install -y python3-dev python3-distutils python3-tk libfuse2 graphviz libgraphviz-dev

# Will need a venv to install the DFC in
!pip install --upgrade pip virtualenv
!virtualenv my_env

For the next codeblock, make sure you downloaded hailo dataflow compiler (python 3.10) from https://hailo.ai/developer-zone/software-downloads/ and copied the .whl to your Google Drive. Change the filename if their is an update i missed.


In [None]:
#Installing the Dataflow Compiler, update the filename if needed
!my_env/bin/pip install /content/gdrive/MyDrive/hailo_dataflow_compiler-3.29.0-py3-none-linux_x86_64.whl

# Check the version and show help information
!my_env/bin/hailo --version
!my_env/bin/hailo -h

## 1.  Parsing from onnx --> har

1. Select the hardware architecture. For the raspberry AI-kits it is `hailo8l`.
2. Open the [netron](https://netron.app/) site, Click `Open Model` and select your onnx file on your local computer.
3. To identify the end nodes, they are the nodes right before the post-processing operations at the very bottom of the model. Their are 2 end nodes per map. I used a search for `onnx::Reshape` to get to the two `conv` layers that pointed to the `onnx::Reshape`.

  In an unmodified yolov8 till yolo11 model this are the endpoints:
  ```
"/model.23/cv2.2/cv2.2.2/Conv",
"/model.23/cv3.2/cv3.2.2/Conv",
"/model.23/cv2.1/cv2.1.2/Conv",
"/model.23/cv3.1/cv3.1.2/Conv",
"/model.23/cv2.0/cv2.0.2/Conv",
"/model.23/cv3.0/cv3.0.2/Conv",
```
  If they are different, then depicted above you have to change it in the code block below.
4. Check the net_input_shapes in netron. Adjust it if your "input layer name": [batch, rgb, image size] are different from:
  ```
  "images": [1, 3, 320, 320]
  ```

5. Run the codeblocks below. The har file is created by the command `runner.translate_onnx_model` and saved with `runner.save_har`. To use the DFC in the venv we make and save the python code in the first codeblock and run it in the venv in the second codeblock. More details about the conversion can be found in the [Parsing tutorial](https://hailo.ai/developer-zone/documentation/dataflow-compiler-v3-29-0/?sp_referrer=tutorials_notebooks/notebooks/DFC_1_Parsing_Tutorial.html).

In [None]:
with open("translate_model.py", "w") as f:
    f.write("""

from hailo_sdk_client import ClientRunner

# Set hailo hardware architecture and onnx model and model path
chosen_hw_arch = "hailo8l" # @param ["hailo8l", "hailo8", "hailo8r", "hailo10h", "hailo15h", "hailo15m"]
onnx_model_name = "best_opset9" # @param {type:"string"}
onnx_path = "/content/gdrive/MyDrive/best_opset9.onnx" # @param {type:"string"}

print("Starting model translation...")

# Initialize the ClientRunner
runner = ClientRunner(hw_arch=chosen_hw_arch)

# Change the end_node_names if netron show different end nodes
end_node_names = [
  "/model.23/cv2.0/cv2.0.2/Conv",
  "/model.23/cv3.0/cv3.0.2/Conv",
  "/model.23/cv2.1/cv2.1.2/Conv",
  "/model.23/cv3.1/cv3.1.2/Conv",
  "/model.23/cv2.2/cv2.2.2/Conv",
  "/model.23/cv3.2/cv3.2.2/Conv",
]

try:
    # Translate the onnx model to har file
    hn, npz = runner.translate_onnx_model(
        onnx_path,
        onnx_model_name,
        end_node_names=end_node_names,
        net_input_shapes={"images": [1, 3, 320, 320]},  # Adjust input shapes if needed
    )
    print("Model translation successful.")
except Exception as e:
    print(f"Error during model translation: {e}")
    raise

# Save the har file
hailo_model_har_name = f"{onnx_model_name}_hailo_model.har"
try:
    runner.save_har(hailo_model_har_name)
    print(f"HAR file saved as: {hailo_model_har_name}")
except Exception as e:
    print(f"Error saving HAR file: {e}")


""")

In [None]:
# Run model in CLI
!my_env/bin/python translate_model.py

## 2. Model optimization
The optimazation from Hailo replaced by the optimization in the guide from trieut415.

1. Print dictionary of layers and operations
2. Load har
3. create model script


1. Print layers

In [None]:
with open("inspect_layers.py", "w") as f:
    f.write("""

from hailo_sdk_client import ClientRunner

# Load the HAR file
har_path = "/content/best_opset9_hailo_model.har" # @param {type:"string"}

runner = ClientRunner(har=har_path)

from pprint import pprint

try:
    # Access the HailoNet as an OrderedDict
    hn_dict = runner.get_hn()  # Or use runner._hn if get_hn() is unavailable
    print("Inspecting layers from HailoNet (OrderedDict):")

    # Pretty-print each layer
    for key, value in hn_dict.items():
        print(f"Key: {key}")
        pprint(value)
        print("\\n" + "="*80 + "\\n")  # Add a separator between layers for clarity

except Exception as e:
    print(f"Error while inspecting hn_dict: {e}")

""")

In [None]:
# Run model in CLI
!my_env/bin/python inspect_layers.py

On the top of the output the output_layers_order is printed. It should look like this. The renamed layers we need to check in de codeblock below and adjust if needed.

```
================================================================================

Key: net_params
OrderedDict([('version', '1.0'),
             ('stage', 'HN'),
             ('clusters_placement', [[]]),
             ('clusters_to_skip', []),
             ('output_layers_order',
              ['best_opset9/conv51',
               'best_opset9/conv54',
               'best_opset9/conv62',
               'best_opset9/conv65',
               'best_opset9/conv77',
               'best_opset9/conv80']),
             ('transposed_net', False),
             ('net_scopes', ['best_opset9'])])

================================================================================
```

Check it the output layers have the correct name in the code in the next codeblock. Adjust if needed.

In [None]:
import json
import os
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive/', force_remount=True)

# Updated NMS layer configuration dictionary
nms_layer_config = {
    "nms_scores_th": 0.3,
    "nms_iou_th": 0.7,
    "image_dims": [640, 640],
    "max_proposals_per_class": 25,
    "classes": 1,
    "regression_length": 16,
    "background_removal": False,
    "background_removal_index": 0,
    "bbox_decoders": [
        {
            "name": "bbox_decoder51", # Change the number (51) to the number of the reg_layer
            "stride": 16,
            "reg_layer": "conv51",    # CHECK THIS
            "cls_layer": "conv54"     # CHECK THIS
        },
        {
            "name": "bbox_decoder62", # Change the number (62) to the number of the reg_layer
            "stride": 32,
            "reg_layer": "conv62",    # CHECK THIS
            "cls_layer": "conv65"     # CHECK THIS
        }
        {
            "name": "bbox_decoder77", # Change the number (62) to the number of the reg_layer
            "stride": 32,
            "reg_layer": "conv77",    # CHECK THIS
            "cls_layer": "conv80"     # CHECK THIS
        }
    ]
}

# Path to save the updated JSON configuration
output_dir = "/save/path/"
os.makedirs(output_dir, exist_ok=True)  # Create the directory if it doesn't exist
output_path = os.path.join(output_dir, "nms_layer_config.json")

# Save the updated configuration as a JSON file
with open(output_path, "w") as json_file:
    json.dump(nms_layer_config, json_file, indent=4)

print(f"NMS layer configuration saved to {output_path}")

### 2.1 Calibration data


- The dataset should contain at least 1024 representative images (not labeled).
- Use a GPU.

1. Unzip the calibration dataset and rename the folder if needed.

In [None]:
from google.colab import drive
import os

drive.mount('/content/gdrive')

# Define Paths with Parameters
calibrationset_path = "/content/gdrive/MyDrive/calibrationDataset.zip"
calibrationset_filename = "calibrationDataset"

try:
  # Unzip the Dataset
  !unzip {calibrationset_path} -d '/content/'

  # Rename the Extracted Folder
  old_path = f'/content/{calibrationset_filename}'
  new_path = '/content/calibrationDataset'
  if os.path.exists(old_path):
    os.rename(old_path, new_path)
  else:
    print(f"Error: {old_path} does not exist.")
except Exception as e:
  print(f"An error occurred: {e}")

2. Make calibration data. Adjust the size of the image if you input layer has an other format (often it is 640x640).

In [None]:
# Make calibration data for the optimization step

import numpy as np
from PIL import Image
import os
from google.colab import drive


# Paths to directories and files
image_dir = '/content/calibrationDataset'
output_dir = '/content/output_dir'
os.makedirs(output_dir, exist_ok=True)  # Create the directory if it doesn't exist

# File paths for saving calibration data
calibration_data_path = os.path.join(output_dir, "calibration_data.npy")
processed_data_path = os.path.join(output_dir, "processed_calibration_data.npy")

# Initialize an empty list for calibration data
calib_data = []

# Process all image files in the directory
for img_name in os.listdir(image_dir):
    img_path = os.path.join(image_dir, img_name)
    if img_name.lower().endswith(('.jpg', '.jpeg', '.png')):
        img = Image.open(img_path).resize((320, 320))  # Resize to desired dimensions
        img_array = np.array(img) / 255.0  # Normalize to [0, 1]
        calib_data.append(img_array)

# Convert the calibration data to a NumPy array
calib_data = np.array(calib_data)

# Save the normalized calibration data
np.save(calibration_data_path, calib_data)
print(f"Normalized calibration dataset saved with shape: {calib_data.shape} to {calibration_data_path}")

# Scale the normalized data back to [0, 255]
processed_calibration_data = calib_data * 255.0

# Save the processed calibration data
np.save(processed_data_path, processed_calibration_data)
print(f"Processed calibration dataset saved with shape: {processed_calibration_data.shape} to {processed_data_path}")

# Stop and start runtime after this codeblock!
# even after processing, the calib_data array might still be in memory.

STOP- AND RESTART SESSION: After running the previous codeblock stop and restart the session to clear the memory!

TODO check if this code gives the same output, should not keep data in memory

In [None]:
### Verbeterde code nog checken en vergelijken met output van bovenstaande

import numpy as np
from PIL import Image
import os

# Paths to directories and files
image_dir = '/content/dataset/valid/images'
output_dir = '/content/output_dir'
os.makedirs(output_dir, exist_ok=True)  # Create the directory if it doesn't exist

# File paths for saving processed data
calibration_data_path = os.path.join(output_dir, "calibration_data.npy")
processed_data_path = os.path.join(output_dir, "processed_calibration_data.npy")

# Process and save each image incrementally to avoid high memory usage
with open(calibration_data_path, 'wb') as calib_file, open(processed_data_path, 'wb') as processed_file:
    for img_name in os.listdir(image_dir):
        img_path = os.path.join(image_dir, img_name)

        if img_name.lower().endswith(('.jpg', '.jpeg', '.png')):
            # Resize and normalize the image
            img = Image.open(img_path).resize((640, 640))
            img_array = np.array(img) / 255.0  # Normalize to [0, 1]

            # Append the normalized data directly to the file
            np.save(calib_file, img_array, allow_pickle=False)
            print(f"Saved {img_name} normalized data to calibration file.")

            # Scale the normalized data back to [0, 255] and save incrementally
            processed_calibration_data = img_array * 255.0
            np.save(processed_file, processed_calibration_data, allow_pickle=False)
            print(f"Saved {img_name} processed calibration data to file.")

print("All images processed and saved.")

Now, weâ€™re finally ready to optimize it with this script, you can find sample .alls files here, I referenced yolo10nms.json as a base to create my alls file.

Note that the change_output_activation applied to my CLS_layer, you can go back and verify this with Netron like specified above.



In [None]:
with open("optimize_model.py", "w") as f:

    f.write("""

import os
from hailo_sdk_client import ClientRunner

# Define your model's HAR file name
model_name = "best_opset9"
hailo_model_har_name = f"{model_name}_hailo_model.har"


# Ensure the HAR file exists
assert os.path.isfile(f"{model_name}_hailo_model.har")

# Initialize the ClientRunner with the HAR file
runner = ClientRunner(har=hailo_model_har_name)

# Define the model script to add a normalization layer
# Normalization for [0, 1] range
alls = \"\"\"
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv54, sigmoid)
change_output_activation(conv65, sigmoid)
change_output_activation(conv80, sigmoid)
nms_postprocess("/content/nms_layer_config.json", meta_arch=yolov8, engine=cpu)
performance_param(compiler_optimization_level=max)
\"\"\"

# Load the model script into the ClientRunner
runner.load_model_script(alls)

# Define a calibration dataset
# Replace 'calib_dataset' with the actual dataset you're using for calibration
# For example, if it's a directory of images, prepare the dataset accordingly
calib_dataset = "/content/output_dir/processed_calibration_data.npy"

# Perform optimization with the calibration dataset
runner.optimize(calib_dataset)

# Save the optimized model to a new Quantized HAR file
quantized_model_har_path = f"{model_name}_quantized_model.har"
runner.save_har(quantized_model_har_path)

print(f"Quantized HAR file saved to: {quantized_model_har_path}")

""")

In [None]:
!my_env/bin/python optimize_model.py

Compiling model

In [None]:
with open("optimize_model.py", "w") as f:

    f.write("""
from hailo_sdk_client import ClientRunner

# Define the quantized model HAR file
model_name = "best_opset9"
quantized_model_har_path = f"{model_name}_quantized_model.har"

# Initialize the ClientRunner with the HAR file
runner = ClientRunner(har=quantized_model_har_path)
print("[info] ClientRunner initialized successfully.")

# Compile the model
try:
    hef = runner.compile()
    print("[info] Compilation completed successfully.")
except Exception as e:
    print(f"[error] Failed to compile the model: {e}")
    raise
file_name = f"{model_name}.hef"
with open(file_name, "wb") as f:
    f.write(hef)
""")

In [None]:
!my_env/bin/python compile_model.py

TODO ZIP