ONNX export of MaskRCNN: dynamic axes seem broken (for batch size > 1) #2311

zuzaanto · 2020-06-10T13:39:52Z

🐛 Bug

At export to ONNX, dynamic axes were set and the inputs and outputs named properly. However, the output of inferred images is incorrect and wrongly named. Depending on different batch sizes used at export and inference, the behaviour varies as follows:
Supposing that batch size at export time is n, and batch size at inference time is m:

if n==m:
Output has length of n*4, so ex. if n=m=3, output has length of 12. In the onnx runtime session, it looks like in the following screenshot:

So only output of the first image in batch is correctly named.
if n<m:
Similar behaviour as in 1., but output for only n first images in batch is returned.
if n>m:
A "SplitToSequence_XXXX" error is returned, ex. that one:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SplitToSequence node. Name:'SplitToSequence_4001' Status Message: split_size_sum (57) != split_dim_size (40)

This exception is similar to behaviour listed in #2309 and seems connected.

To Reproduce

Steps to reproduce the behavior:
1.
Load and export a pretrained MaskRCNN model using input_tensor of shape (n, 3, 1024, 1024), for example (4,3,1024,1024):

model = torchvision.models.detection.maskrcnn_resnet50_fpn(
            pretrained=False,
            min_size=1024, max_size=1024,
            pretrained_backbone=False,
            num_classes=num_classnames + 1,  # + background class
            image_mean=image_mean,
            image_std=image_std,
)
torch.onnx.export(
        model,
        input_tensor.float(),
        onnx_model_filepath,
        export_params=True,
        opset_version=12,
        do_constant_folding=False,
        input_names=["images_tensors"],
        output_names=["boxes", "labels", "scores", "masks"],
        dynamic_axes={"images_tensors": [0, 1, 2, 3], "boxes": [0, 1], "labels": [0],
                      "scores": [0], "masks": [0, 1, 2, 3]},
)

Load and infer ONNX model on input_tensor of shape (m,3,1024,1024), where m corresponds to the value in the description above, and different m values (bigger, smaller or equal to n) will result in different behaviours.

input_array = input_tensor.cpu().numpy()
ort_session = onnxruntime.InferenceSession(onnx_model_filepath)
ort_inputs = {"images_tensors": input_array}
ort_outs = ort_session.run(None, ort_inputs)
outputs = ort_session.get_outputs()

These outputs are presented in the screenshot above.

Expected behavior

With dynamic_axes set properly, I expect:

Output length dependent on batch size of the inferred tensor, not the one used for export.
Output of shape similar to the torch model's output, which is a list (len == batch size) of dictionaries of boxes, labels, scores and masks. Also, all outputs correctly named, not like currently in the above screenshot.
No exceptions if inferred tensor is of smaller batch size than the one used for export.

Environment

PyTorch version: 1.6.0.dev20200526+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 20.04 LTS
GCC version: (Ubuntu 9.3.0-10ubuntu2) 9.3.0
CMake version: version 3.16.3

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti

Nvidia driver version: 440.64
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4

Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.6.0.dev20200526+cu101
[pip3] torchvision==0.7.0.dev20200526+cu101
[conda] Could not collect

Also:

ONNX_runtime and ONNX_runtime_gpu==1.3.0
ONNX==1.7.0

Additional context

This seems connected to #2309 and #2251

The text was updated successfully, but these errors were encountered:

fmassa · 2020-06-10T13:45:39Z

cc @neginraoof

neginraoof · 2020-06-15T21:20:59Z

Thanks for reporting this issue. Looks like issue #2309 and #2311 are similar. Will take a look.

neginraoof · 2020-06-16T21:56:06Z

Actually looks like this is a known issue in export of models for batch size > 1, and current tests only cover batch size = 1.
We have a long term plan to address this limitation.

cc @fmassa

FraPochetti · 2021-01-13T11:00:43Z

@neginraoof

Actually looks like this is a known issue in export of models for batch size > 1, and current tests only cover batch size = 1.
We have a long term plan to address this limitation.

Has this been addressed, by any chance?
I am trying to play around with dynamic_axes with torchvision models (faster_rcnn, mask_rcnn, etc) and I cannot seem to get it to work.
I asked on the forums too, with no luck.

BlakeXiaochu · 2021-01-13T12:46:28Z

Same problem. Is there any plan to support dynamic batch inference now?

EmanuelAlogna · 2022-03-21T14:39:33Z

Is there an ETA for support of dynamic batch size > 1 ?

fmassa added bug module: onnx topic: object detection labels Jun 10, 2020

neginraoof mentioned this issue Jun 15, 2020

ONNX export of MaskRCNN: inference fails when batch size > 1 and no detections #2309

Open

This was referenced Jan 13, 2021

Torchvision models do not seem to support dynamic_axes in ONNX export muellerzr/fastinference#30

Closed

ONNX export of MaskRCNN: inference fails when no detections are present #2251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX export of MaskRCNN: dynamic axes seem broken (for batch size > 1) #2311

ONNX export of MaskRCNN: dynamic axes seem broken (for batch size > 1) #2311

zuzaanto commented Jun 10, 2020

fmassa commented Jun 10, 2020

neginraoof commented Jun 15, 2020

neginraoof commented Jun 16, 2020 •

edited

Loading

FraPochetti commented Jan 13, 2021 •

edited

Loading

BlakeXiaochu commented Jan 13, 2021 •

edited

Loading

EmanuelAlogna commented Mar 21, 2022

ONNX export of MaskRCNN: dynamic axes seem broken (for batch size > 1) #2311

ONNX export of MaskRCNN: dynamic axes seem broken (for batch size > 1) #2311

Comments

zuzaanto commented Jun 10, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

fmassa commented Jun 10, 2020

neginraoof commented Jun 15, 2020

neginraoof commented Jun 16, 2020 • edited Loading

FraPochetti commented Jan 13, 2021 • edited Loading

BlakeXiaochu commented Jan 13, 2021 • edited Loading

EmanuelAlogna commented Mar 21, 2022

neginraoof commented Jun 16, 2020 •

edited

Loading

FraPochetti commented Jan 13, 2021 •

edited

Loading

BlakeXiaochu commented Jan 13, 2021 •

edited

Loading