# YOLOv8n Object Detection PyTorch Model - Quantization for IMX500

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/pytorch/pytorch_yolov8n_seg_for_imx500.ipynb)

## Overview

In this tutorial, we will illustrate a basic and quick process of preparing a pre-trained model for deployment using MCT. Specifically, we will demonstrate how to download a pre-trained YOLOv8n instance segmentation model from the MCT Models Library, compress it, and make it deployment-ready using MCT's post-training quantization techniques.

We will use an existing pre-trained YOLOv8n instance segmentation model based on [Ultralytics](https://github.com/ultralytics/ultralytics). The model was slightly adjusted for model quantization. We will quantize the model using MCT post training quantization and evaluate the performance of the floating point model and the quantized model on COCO dataset.


## Summary

In this tutorial we will cover:

1. Post-Training Quantization using MCT of PyTorch object detection model.
2. Data preparation - loading and preprocessing validation and representative datasets from COCO.
3. Accuracy evaluation of the floating-point and the quantized models.

## Setup
### Install the relevant packages

In [None]:
!pip install -q torch
!pip install onnx
!pip install -q pycocotools
!pip install 'huggingface-hub>=0.21.0'

 Clone a copy of the [MCT](https://github.com/sony/model_optimization) (Model Compression Toolkit) into your current directory. This step ensures that you have access to [MCT Models Garden](https://github.com/sony/model_optimization/tree/main/tutorials/mct_model_garden) folder which contains all the necessary utility functions for this tutorial.
  **It's important to note that we use the most up-to-date MCT code available.**

In [None]:
import sys
import os
import importlib

if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit
!git clone https://github.com/sony/model_optimization.git temp_mct && mv temp_mct/tutorials . && \rm -rf temp_mct
sys.path.insert(0,"tutorials")

### Download COCO evaluation set

In [None]:
if not os.path.isdir('coco'):
    !wget -nc http://images.cocodataset.org/annotations/annotations_trainval2017.zip
    !unzip -q -o annotations_trainval2017.zip -d ./coco
    !echo Done loading annotations
    !wget -nc http://images.cocodataset.org/zips/val2017.zip
    !unzip -q -o val2017.zip -d ./coco
    !echo Done loading val2017 images

## Model Quantization

### Download a Pre-Trained Model

We begin by loading a pre-trained [YOLOv8n](https://huggingface.co/SSI-DNN/pytorch_yolov8n_inst_seg_640x640) model. This implementation is based on [Ultralytics](https://github.com/ultralytics/ultralytics) and includes a slightly modified version of yolov8 detection and segmentation head that was adapted for model quantization. For further insights into the model's implementation details, please refer to [MCT Models Garden - yolov8](https://github.com/sony/model_optimization/tree/main/tutorials/mct_model_garden/models_pytorch/yolov8).  

In [None]:
from tutorials.mct_model_garden.models_pytorch.yolov8.yolov8 import ModelPyTorch, yaml_load
cfg_dict = yaml_load("./tutorials/mct_model_garden/models_pytorch/yolov8/yolov8-seg.yaml", append_filename=True)  # model dict
model = ModelPyTorch.from_pretrained("SSI-DNN/pytorch_yolov8n_inst_seg_640x640", cfg=cfg_dict, mode='segmentation')

### Post training quantization using Model Compression Toolkit

Now, we're all set to use MCT's post-training quantization. To begin, we'll define a representative dataset and proceed with the model quantization. Please note that, for demonstration purposes, we'll use the evaluation dataset as our representative dataset. We'll calibrate the model using 100 representative images, divided into 20 iterations of 'batch_size' images each.

Additionally, to further compress the model's memory footprint, we will employ the mixed-precision quantization technique. This method allows each layer to be quantized with different precision options: 2, 4, and 8 bits, aligning with the imx500 target platform capabilities.

In [None]:
import model_compression_toolkit as mct
from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import coco_dataset_generator
from tutorials.mct_model_garden.models_pytorch.yolov8.yolov8_preprocess import yolov8_preprocess_chw_transpose
from typing import Iterator, Tuple, List

REPRESENTATIVE_DATASET_FOLDER = './coco/val2017/'
REPRESENTATIVE_DATASET_ANNOTATION_FILE = './coco/annotations/instances_val2017.json'
BATCH_SIZE = 4
n_iters = 20

# Load representative dataset
representative_dataset = coco_dataset_generator(dataset_folder=REPRESENTATIVE_DATASET_FOLDER,
                                                annotation_file=REPRESENTATIVE_DATASET_ANNOTATION_FILE,
                                                preprocess=yolov8_preprocess_chw_transpose,
                                                batch_size=BATCH_SIZE)

# Define representative dataset generator
def get_representative_dataset(n_iter: int, dataset_loader: Iterator[Tuple]):
    """
    This function creates a representative dataset generator. The generator yields numpy
        arrays of batches of shape: [Batch, H, W ,C].
    Args:
        n_iter: number of iterations for MCT to calibrate on
    Returns:
        A representative dataset generator
    """
    def representative_dataset() -> Iterator[List]:
        ds_iter = iter(dataset_loader)
        for _ in range(n_iter):
            yield [next(ds_iter)[0]]

    return representative_dataset

# Get representative dataset generator
representative_dataset_gen = get_representative_dataset(n_iter=n_iters,
                                                        dataset_loader=representative_dataset)

# Set IMX500-v1 TPC
tpc = mct.get_target_platform_capabilities(fw_name="pytorch",
                                           target_platform_name='imx500',
                                           target_platform_version='v1')

# Specify the necessary configuration for mixed precision quantization. To keep the tutorial brief, we'll use a small set of images and omit the hessian metric for mixed precision calculations. It's important to be aware that this choice may impact the resulting accuracy.
mp_config = mct.core.MixedPrecisionQuantizationConfig(num_of_images=5,
                                                      use_hessian_based_scores=False)
config = mct.core.CoreConfig(mixed_precision_config=mp_config,
                             quantization_config=mct.core.QuantizationConfig(shift_negative_activation_correction=True))

# Define target Resource Utilization for mixed precision weights quantization (75% of 'standard' 8bits quantization)
resource_utilization_data = mct.core.pytorch_resource_utilization_data(in_model=model,
                                                                       representative_data_gen=
                                                                       representative_dataset_gen,
                                                                       core_config=config,
                                                                       target_platform_capabilities=tpc)
resource_utilization = mct.core.ResourceUtilization(weights_memory=resource_utilization_data.weights_memory * 0.75)

# Perform post training quantization
quant_model, _ = mct.ptq.pytorch_post_training_quantization(in_module=model,
                                                            representative_data_gen=
                                                            representative_dataset_gen,
                                                            target_resource_utilization=resource_utilization,
                                                            core_config=config,
                                                            target_platform_capabilities=tpc)


### Model Export

Now, we can export the quantized model, ready for deployment, into a `.onnx` format file. Please ensure that the `save_model_path` has been set correctly.

In [None]:
import model_compression_toolkit as mct

mct.exporter.pytorch_export_model(model=quant_model,
                                  save_model_path='./quant_model.onnx',
                                  repr_dataset=representative_dataset_gen)

### Gradient-Based Post Training Quantization using Model Compression Toolkit
Here we demonstrate how to further optimize the quantized model performance using gradient-based PTQ technique.
**Please note that this section is computationally heavy, and it's recommended to run it on a GPU. For fast deployment, you may choose to skip this step.**

We will start by loading the COCO training set, and re-define the representative dataset accordingly.

In [None]:
!wget -nc http://images.cocodataset.org/zips/train2017.zip
!unzip -q -o train2017.zip -d ./coco
!echo Done loading train2017 images

GPTQ_REPRESENTATIVE_DATASET_FOLDER = './coco/train2017/'
GPTQ_REPRESENTATIVE_DATASET_ANNOTATION_FILE = './coco/annotations/instances_train2017.json'
BATCH_SIZE = 4
n_iters = 20

# Load representative dataset
gptq_representative_dataset = coco_dataset_generator(dataset_folder=GPTQ_REPRESENTATIVE_DATASET_FOLDER,
                                                annotation_file=GPTQ_REPRESENTATIVE_DATASET_ANNOTATION_FILE,
                                                preprocess=yolov8_preprocess_chw_transpose,
                                                batch_size=BATCH_SIZE)

# Get representative dataset generator
gptq_representative_dataset_gen = get_representative_dataset(n_iter=n_iters,
                                                        dataset_loader=gptq_representative_dataset)

Next, we'll set up the Gradient-Based PTQ configuration and execute the necessary MCT command. Keep in mind that this step can be time-consuming, depending on your runtime. We recomend for the best results increase n_gptq_epochs to > 1000 

In [None]:
# Specify the necessary configuration for Gradient-Based PTQ.
n_gptq_epochs = 15 # for best results increase this value to 1000
gptq_config = mct.gptq.get_pytorch_gptq_config(n_epochs=n_gptq_epochs, use_hessian_based_weights=False)

# Perform Gradient-Based Post Training Quantization
gptq_quant_model, _ = mct.gptq.pytorch_gradient_post_training_quantization(
    model=model,
    representative_data_gen=gptq_representative_dataset_gen,
    target_resource_utilization=resource_utilization,
    gptq_config=gptq_config,
    core_config=config,
    target_platform_capabilities=tpc)

### Model Export

Now, we can export the quantized model, ready for deployment, into a `.onnx` format file. Please ensure that the `save_model_path` has been set correctly. This can be converted with sdsp to imx500 format.

In [None]:
mct.exporter.pytorch_export_model(model=gptq_quant_model,
                                  save_model_path='./qmodel_gptq.onnx',
                                  repr_dataset=gptq_representative_dataset_gen)

## Evaluation on COCO dataset

### Floating point model evaluation
Next, we evaluate the floating point model by using `cocoeval` library alongside additional dataset utilities. We can verify the mAP accuracy aligns with that of the original model.
Please ensure that the dataset path has been set correctly before running this code cell. Adjust img_ids_limit based on your runtime. 

In [None]:
from tutorials.mct_model_garden.models_pytorch.yolov8.yolov8 import seg_model_predict
from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import evaluate_yolov8_segmentation
from model_compression_toolkit.core.pytorch.pytorch_device_config import get_working_device
device = get_working_device()
model = model.to(device)
evaluate_yolov8_segmentation(model, seg_model_predict, data_dir='coco', data_type='val2017', img_ids_limit=100, output_file='results.json', iou_thresh=0.7, conf=0.001, max_dets=300,mask_thresh=0.55)

### Quantized model evaluation
We can evaluate the performance of the quantized model. There is a slight decrease in performance that can be further mitigated by either expanding the representative dataset or employing MCT's advanced quantization methods, such as GPTQ (Gradient-Based/Enhanced Post Training Quantization).

In [None]:
from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import evaluate_yolov8_segmentation
evaluate_yolov8_segmentation(quant_model, seg_model_predict, data_dir='coco', data_type='val2017', img_ids_limit=100, output_file='results_quant.json', iou_thresh=0.7, conf=0.001, max_dets=300,mask_thresh=0.55)

### Gradient quant Evaluation
Finally, we can evaluate the performance of the quantized model through GPTQ (Gradient-Based/Enhanced Post Training Quantization). We anticipate an improvement in performance compare to the quantized model utilizing PTQ.

In [None]:
from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import evaluate_yolov8_segmentation
evaluate_yolov8_segmentation(gptq_quant_model, seg_model_predict, data_dir='coco', data_type='val2017', img_ids_limit=100, output_file='results_g_quant.json', iou_thresh=0.7, conf=0.001, max_dets=300,mask_thresh=0.55)

### Visulise Predictions

Finally we can visulise the predictions. Code segment below displays the predictions used for evaluation against the ground truth for an image. To view the output of a different model run evaluation for a said model and align the results.json file below.
A random set of images are displayed.

In [None]:
import cv2
import numpy as np
from matplotlib import pyplot as plt
from pycocotools.coco import COCO
import json
import random

# Number of sets to display
num_sets = 20

# adjust results file name to view quant and gradient quant
with open('results.json', 'r') as file:
    results = json.load(file)

# Extract unique image IDs from the results
result_imgIds = list({result['image_id'] for result in results})

dataDir = 'coco'
dataType = 'val2017'
annFile = f'{dataDir}/annotations/instances_{dataType}.json'
resultsFile = 'results.json'
cocoGt = COCO(annFile)
cocoDt = cocoGt.loadRes(resultsFile)
plt.figure(figsize=(15, 7 * num_sets))

for i in range(num_sets):
    random_imgId = random.choice(result_imgIds)
    img = cocoGt.loadImgs(random_imgId)[0]
    image_path = f'{dataDir}/{dataType}/{img["file_name"]}'
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert from BGR to RGB

    plt.subplot(num_sets, 2, 2*i + 1)
    plt.imshow(image)
    plt.axis('off')
    plt.title(f'Ground Truth {random_imgId}')

    # Load and display ground truth annotations with bounding boxes
    annIds = cocoGt.getAnnIds(imgIds=img['id'], iscrowd=None)
    anns = cocoGt.loadAnns(annIds)
    for ann in anns:
        cocoGt.showAnns([ann], draw_bbox=True)
        # Draw category ID on the image
        bbox = ann['bbox']
        plt.text(bbox[0], bbox[1], str(ann['category_id']), color='white', fontsize=12, bbox=dict(facecolor='red', alpha=0.5))

    plt.subplot(num_sets, 2, 2*i + 2)
    plt.imshow(image)
    plt.axis('off')
    plt.title(f'Model Output {random_imgId}')

    # Load and display model predictions with bounding boxes
    annIdsDt = cocoDt.getAnnIds(imgIds=img['id'])
    annsDt = cocoDt.loadAnns(annIdsDt)
    for ann in annsDt:
        cocoDt.showAnns([ann], draw_bbox=True)
        # Draw category ID on the image
        bbox = ann['bbox']
        plt.text(bbox[0], bbox[1], str(ann['category_id']), color='white', fontsize=12, bbox=dict(facecolor='blue', alpha=0.5))

plt.tight_layout()
plt.show()

### Summary

In this notebook we load weights of yolov8n_instance_segmentation model quantise said model with both ptq and gradient based methods, evaluate and finally show the user a method for visulisation.

\
Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.