# Deploying RetinaNet with Detectron2 and ONNX
**Jin Yeom**

In this notebook, we experiment with deployment of the Detectron2 implementation of RetinaNet, while paying special attention to how much of it we can convert to a desired ONNX format -- which is crucial to when the model is deployed with TensorRT.

In [1]:
from pprint import pprint

In [2]:
import torch as pt
from torch import nn

In [3]:
device = pt.device('cuda' if pt.cuda.is_available() else 'cpu')
print("device =", device)

device = cuda


## RetinaNet

In [4]:
from detectron2.modeling import build_model
from detectron2.config import get_cfg

In [5]:
cfg = get_cfg()
cfg.merge_from_file('/detectron2/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml')

Loading config /detectron2/configs/COCO-Detection/../Base-RetinaNet.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
Config '/detectron2/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2.


In [6]:
pprint(cfg)

{'CUDNN_BENCHMARK': False,
 'DATALOADER': {'ASPECT_RATIO_GROUPING': True,
                'NUM_WORKERS': 4,
                'REPEAT_THRESHOLD': 0.0,
                'SAMPLER_TRAIN': 'TrainingSampler'},
 'DATASETS': {'PRECOMPUTED_PROPOSAL_TOPK_TEST': 1000,
              'PRECOMPUTED_PROPOSAL_TOPK_TRAIN': 2000,
              'PROPOSAL_FILES_TEST': (),
              'PROPOSAL_FILES_TRAIN': (),
              'TEST': ('coco_2017_val',),
              'TRAIN': ('coco_2017_train',)},
 'GLOBAL': CfgNode({'HACK': 1.0}),
 'INPUT': {'CROP': {'ENABLED': False,
                    'SIZE': [0.9, 0.9],
                    'TYPE': 'relative_range'},
           'FORMAT': 'BGR',
           'MASK_FORMAT': 'polygon',
           'MAX_SIZE_TEST': 1333,
           'MAX_SIZE_TRAIN': 1333,
           'MIN_SIZE_TEST': 800,
           'MIN_SIZE_TRAIN': (640, 672, 704, 736, 768, 800),
           'MIN_SIZE_TRAIN_SAMPLING': 'choice'},
 'MODEL': {'ANCHOR_GENERATOR': {'ANGLES': [[-90, 0, 90]],
                       

In [7]:
model = build_model(cfg).eval()

In [8]:
model

RetinaNet(
  (backbone): FPN(
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelP6P7(
      (p6): Conv2d(2048, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): BottleneckBlock(


I very much doubt that this will work right out of the box, but let's try converting this model to ONNX. For this process, I referenced [this tutorial](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html).

In [9]:
from torch import onnx

In [10]:
opset=10
input_names = ['fpn_lateral3']
output_names = ['cls_score', 'bbox_pred']
batch_size = 1
channels = 3
image_height = 640
image_width = 640
dummy_input = pt.randn(
    batch_size, 
    channels, 
    image_height, 
    image_width
).to(device)

In [11]:
onnx.export(
    model, 
    dummy_input, # This will break!
    'retinanet.onnx', 
    verbose=True, 
    input_names=input_names, 
    output_names=output_names
)



IndexError: too many indices for tensor of dimension 3

So, what happened here? When I looked into how the input image propagates through our detection model, I learned that the core of this model is largely in two parts: `backbone` and `head`; everything else is more or less for post-processing. What is currently inconvinient for us is that the `RetinaNet` module that wraps these two expects certain formats for input and output. One way to mitigate this issue would be to implement a function that "extracts" these modules and create a single exportable module.

In [12]:
class DeployableDetectron2Model(nn.Module):
    def __init__(self, detectron2_model):
        super().__init__()
        self.backbone = detectron2_model.backbone
        self.head = detectron2_model.head
        self.in_features = detectron2_model.in_features
        
    def forward(self, x):
        features = self.backbone(x)
        features = [features[f] for f in self.in_features]
        return self.head(features)

In [13]:
deployable = DeployableDetectron2Model(model).to(device)
del model

In [14]:
deployable

DeployableDetectron2Model(
  (backbone): FPN(
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelP6P7(
      (p6): Conv2d(2048, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): B

Let's try this again!

In [15]:
onnx.export(
    deployable, 
    dummy_input, 
    'retinanet.onnx',
    export_params=True,
    opset_version=opset,
    input_names=input_names, 
    output_names=output_names
)

ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator. 
  "" + str(_export_onnx_opset_version) + ". "


This is a bit problematic. Even if we were to use TensorRT 6 (which is currently the latest version), we're limited to `opset <= 10`. So, we won't really know how this warning affects the performance of our model until we try it ourselves.

## Baseline evaluation

Here, we're going to evaluate the baseline performance of RetinaNet, before conversion to ONNX.

In [16]:
from detectron2.evaluation import COCOEvaluator

In [None]:
evaluator = COCOEvaluator(
    # TODO @jinyeom:
    #   Implement the baseline evaluator
)