# Deploying RetinaNet with Detectron2 and ONNX
**Jin Yeom**

In this notebook, we experiment with deployment of the Detectron2 implementation of RetinaNet, while paying special attention to how much of it we can convert to a desired ONNX format -- which is crucial to when the model is deployed with TensorRT.

In [1]:
from pprint import pprint

In [2]:
import torch as pt
from torch import nn

In [3]:
device = pt.device('cuda' if pt.cuda.is_available() else 'cpu')
print("device =", device)

device = cuda


## RetinaNet

In [4]:
from detectron2.modeling import build_model
from detectron2.config import get_cfg
from detectron2.checkpoint import DetectionCheckpointer

In [5]:
cfg = get_cfg()
cfg.merge_from_file('/root/cvdev/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml')

Loading config /root/cvdev/configs/COCO-Detection/../Base-RetinaNet.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
Config '/root/cvdev/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2.


Download and add the trained model weights.

In [6]:
!wget https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_1x/137593951/model_final_b796dc.pkl -O model_final.pkl

--2019-10-22 19:04:56--  https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_1x/137593951/model_final_b796dc.pkl
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.20.22.166, 104.20.6.166, 2606:4700:10::6814:6a6, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.20.22.166|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 152130042 (145M) [application/octet-stream]
Saving to: 'model_final.pkl'


2019-10-22 19:05:04 (20.8 MB/s) - 'model_final.pkl' saved [152130042/152130042]



In [7]:
pprint(cfg)

{'CUDNN_BENCHMARK': False,
 'DATALOADER': {'ASPECT_RATIO_GROUPING': True,
                'FILTER_EMPTY_ANNOTATIONS': True,
                'NUM_WORKERS': 4,
                'REPEAT_THRESHOLD': 0.0,
                'SAMPLER_TRAIN': 'TrainingSampler'},
 'DATASETS': {'PRECOMPUTED_PROPOSAL_TOPK_TEST': 1000,
              'PRECOMPUTED_PROPOSAL_TOPK_TRAIN': 2000,
              'PROPOSAL_FILES_TEST': (),
              'PROPOSAL_FILES_TRAIN': (),
              'TEST': ('coco_2017_val',),
              'TRAIN': ('coco_2017_train',)},
 'GLOBAL': CfgNode({'HACK': 1.0}),
 'INPUT': {'CROP': {'ENABLED': False,
                    'SIZE': [0.9, 0.9],
                    'TYPE': 'relative_range'},
           'FORMAT': 'BGR',
           'MASK_FORMAT': 'polygon',
           'MAX_SIZE_TEST': 1333,
           'MAX_SIZE_TRAIN': 1333,
           'MIN_SIZE_TEST': 800,
           'MIN_SIZE_TRAIN': (640, 672, 704, 736, 768, 800),
           'MIN_SIZE_TRAIN_SAMPLING': 'choice'},
 'MODEL': {'ANCHOR_GENERATOR': 

In [8]:
model = build_model(cfg).eval()
DetectionCheckpointer(model).load('model_final.pkl')

{'__author__': 'Detectron2 Model Zoo'}

In [9]:
model

RetinaNet(
  (backbone): FPN(
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelP6P7(
      (p6): Conv2d(2048, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): BottleneckBlock(


I very much doubt that this will work right out of the box, but let's try converting this model to ONNX. For this process, I referenced [this tutorial](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html).

In [10]:
opset=10
input_names = ['fpn_lateral3']
output_names = ['cls_score', 'bbox_pred']
batch_size = 1
channels = 3
image_height = 640
image_width = 640
dummy_input = pt.randn(
    batch_size, 
    channels, 
    image_height, 
    image_width
).to(device)

In [11]:
pt.onnx.export(
    model, 
    dummy_input, # This will break!
    'retinanet.onnx', 
    verbose=True, 
    input_names=input_names, 
    output_names=output_names
)



IndexError: too many indices for tensor of dimension 3

So, what happened here? When I looked into how the input image propagates through our detection model, I learned that the core of this model is largely in two parts: `backbone` and `head`; everything else is more or less for post-processing. What is currently inconvinient for us is that the `RetinaNet` module that wraps these two expects certain formats for input and output. One way to mitigate this issue would be to implement a function that "extracts" these modules and create a single exportable module.

In [12]:
class DeployableDetectron2Model(nn.Module):
    def __init__(self, detectron2_model):
        super().__init__()
        self.backbone = detectron2_model.backbone
        self.head = detectron2_model.head
        self.in_features = detectron2_model.in_features
        
    def forward(self, x):
        features = self.backbone(x)
        features = [features[f] for f in self.in_features]
        return self.head(features)
    
    def export(self, filename, dummy_input, input_names, output_names, opset=10):
        # NOTE @jinyeom:
        #   This can potentially move to a more "global" place.
        @pt.onnx.symbolic_helper.parse_args('v', 'is')
        def upsample_nearest2d(g, x, output_size):
            h = float(output_size[-2]) / x.type().sizes()[-2]
            w = float(output_size[-1]) / x.type().sizes()[-1]
            return g.op(
                'Upsample', 
                x,
                scales_f=(1, 1, h, w),
                mode_s='nearest'
            )
        pt.onnx.symbolic_helper.upsample_nearest2d = upsample_nearest2d
        
        pt.onnx.export(
            self, 
            dummy_input, 
            filename,
            export_params=True,
            opset_version=opset,
            input_names=input_names, 
            output_names=output_names
        )

In [13]:
deployable = DeployableDetectron2Model(model).to(device)
del model # to save memory

In [14]:
deployable

DeployableDetectron2Model(
  (backbone): FPN(
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelP6P7(
      (p6): Conv2d(2048, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): B

Let's try this again!

In [15]:
deployable.export(
    'retinanet.onnx',
    dummy_input,
    input_names, 
    output_names
)

RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 3.95 GiB total capacity; 521.25 MiB already allocated; 35.56 MiB free; 28.75 MiB cached)

This is a bit problematic. Even if we were to use TensorRT 6 (which is currently the latest version), we're limited to `opset <= 10`. So, we won't really know how this warning affects the performance of our model until we try it ourselves.