# Custom Layer Usage Example - multiclass_nms Layer

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/custom_layers/blob/main/tutorials/pytorch/multiclass_nms_custom_layer_example.ipynb)
 

## Overview

In this tutorial we will illustrate how to integrate a custom layer with model quantization using the [MCT](https://github.com/sony/model_optimization) library.
Using a simple object detection model as an example, we will apply post-training quantization, then incorporate a custom NMS layer into the quantized model.

The process consists of the following steps:

1. Quantize your pre-trained model using the MCT (ensure it does not already include the operation you plan to replace with a custom layer).
2. Attach the custom layer to the quantized model.
3. Export the modified model for deployment.

## Setup

### Install & import relevant packages

In [None]:
!pip install -q sony-custom-layers[torch] model_compression_toolkit

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Iterator, List
import model_compression_toolkit as mct
from sony_custom_layers.pytorch.nms import multiclass_nms

## Model Quantization

### Create Model Instance

We will start with creating a simple object-detection model instance as an example. You can replace the model with your own pre-trained model. Make sure your model doesn't already include NMS operation. 

In [None]:
class ObjectDetector(nn.Module):
    def __init__(self, num_classes=2, max_detections=300):
        super().__init__()
        self.max_detections = max_detections

        self.backbone = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )

        self.bbox_reg = nn.Conv2d(32, 4 * max_detections, kernel_size=1)
        self.class_reg = nn.Conv2d(32, num_classes * max_detections, kernel_size=1)

    def forward(self, x):
        batch_size = x.size(0)
        features = self.backbone(x)
        H_prime = features.shape[2]
        W_prime = features.shape[3]
        
        bbox = self.bbox_reg(features)
        bbox = bbox.view(batch_size, self.max_detections, 4, H_prime * W_prime).mean(dim=3)
        class_probs = self.class_reg(features).view(batch_size, self.max_detections, -1, H_prime * W_prime)
        class_probs = F.softmax(class_probs.mean(dim=3), dim=2)
        return bbox, class_probs

model = ObjectDetector()
model.eval()

### Post-Training Quantization using Model Compression Toolkit

We're all set to use MCT's post-training quantization. 
To begin, we'll define a representative dataset generator. Please note that for demonstration purposes, we will generate random data of the desired image shape instead of using real images.  
Then, we will apply PTQ on our model using the dataset generator we have created. For more details on using MCT, refer to the [MCT tutorials](https://github.com/sony/model_optimization/tree/main/tutorials)

In [None]:
NUM_ITERS = 20
BATCH_SIZE = 32

def get_representative_dataset(n_iter: int):
    """
    This function creates a representative dataset generator. The generator yields numpy
        arrays of batches of shape: [Batch, C, H, W].
    Args:
        n_iter: number of iterations for MCT to calibrate on
    Returns:
        A representative dataset generator
    """
    def representative_dataset() -> Iterator[List]:
        for _ in range(n_iter):
            yield [torch.rand(BATCH_SIZE, 3, 64, 64)]

    return representative_dataset

representative_data_generator = get_representative_dataset(n_iter=NUM_ITERS)

quant_model, _ = mct.ptq.pytorch_post_training_quantization(model, representative_data_gen=representative_data_generator)
print('Quantized model is ready')

##  Custom Layer Stitching

Now that we have a quantized model, we can add it a custom layer. In our example we will add NMS layer by creating a model wrapper that applies NMS over the quantized model output. You can use this wrapper for your own model.

Note that in this case, the `multiclass_nms` custom layer is the final layer. If the custom layer outputs indices, like the `multiclass_nms_with_indices` layer, you may find the `torch.gather` operation useful for selecting the required output data based on those indices. You can incorporate this data selection operation into this wrapper as well.

In [None]:
class PostProcessWrapper(nn.Module):
    def __init__(self,
                 model: nn.Module,
                 score_threshold: float = 0.001,
                 iou_threshold: float = 0.7,
                 max_detections: int = 20):

        super(PostProcessWrapper, self).__init__()
        self.model = model
        self.score_threshold = score_threshold
        self.iou_threshold = iou_threshold
        self.max_detections = max_detections

    def forward(self, images):
        # model inference
        outputs = self.model(images)

        boxes = outputs[0]
        scores = outputs[1]
        nms = multiclass_nms(boxes=boxes, scores=scores, score_threshold=self.score_threshold,
                             iou_threshold=self.iou_threshold, max_detections=self.max_detections)
        return nms

device = "cuda" if torch.cuda.is_available() else "cpu"
quant_model_with_nms = PostProcessWrapper(model=quant_model,
                                    score_threshold=0.001,
                                    iou_threshold=0.7,
                                    max_detections=20).to(device=device)
print('Quantized model with NMS is ready')

### Model Export

Finally, we can export the quantized model into a .onnx format file. Please ensure that the save_model_path has been set correctly.

In [None]:
model_path = './qmodel_with_nms.onnx'
mct.exporter.pytorch_export_model(model=quant_model_with_nms,
                                  save_model_path=model_path,
                                  repr_dataset=representative_data_generator)

### Model Inference

In order to run model inference over our saved onnx model, we need to load the necessary custom operations using `load_custom_ops()` and create an onnxruntime inference session with these custom operations.


In [None]:
import onnxruntime as ort
from sony_custom_layers.pytorch import load_custom_ops
import numpy as np

random_input = np.random.rand(*(1, 3, 64, 64)).astype(np.float32)

so = load_custom_ops()
session = ort.InferenceSession(model_path, sess_options=so)
input_name = session.get_inputs()[0].name
output_names = [output.name for output in session.get_outputs()]
preds = session.run(output_names, {input_name: random_input})

"""
One can access prediction items as follows:
boxes = preds[0]
scores = preds[1]
labels = preds[2]
n_valid = preds[3]
"""
pass

Copyright 2025 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.