Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch2onnx #3075

Merged
merged 14 commits into from Aug 4, 2020
Merged

Pytorch2onnx #3075

merged 14 commits into from Aug 4, 2020

Conversation

drcut
Copy link
Contributor

@drcut drcut commented Jun 19, 2020

Support convert RetinaNet from Pytorch to ONNX.
We can verify the computation results between Pytorch and ONNX.
We do several things in this PR:
[1] Replace some Pytorch op that are not supported by ONNX
[2] Replace some dynamic shape by static shape, as ONNX only support constant shape in some case
[3] Fix some bugs in Pytorch1.3 while converting to ONNX, which may cause numerical error while running by onnxruntime
[4] Update tool/pytorch2onnx.py file with our new API

@drcut
Copy link
Contributor Author

drcut commented Jun 19, 2020

What does this from onnx_util.symbolic import register_extra_symbolics mean, without this module @drcut

This function is in tools/onnx_util/symbolic.py. This is for users who use pytorch1.3. There are some bugs in Pytorch1.3's onnx part, besides, it does not implement TopK op. So you can regard register_extra_symbolics as a mock patch.
Thanks

@drcut
Copy link
Contributor Author

drcut commented Jun 19, 2020

Can you provide me with an onnx model of retinanet? I use it for testing, the coco data set is fine, resnet50fpn, thank you @drcut

As ONNX can not support dynamic input shape, I believe it's much more convenient for you if you can run the code locally, so that you can set whatever the input shape you want.
Would you please get the model with the below command:
python tools/pytorch2onnx.py configs/retinanet/retinanet_r50_fpn_1x_coco.py --checkpoint checkpoints/retinanet/retinanet_r50_fpn_1x_20181125-3d3c2142.pth --verify --shape 20 20 --output_file retinanet.onnx
If there is anything goes wrong, please tell me and maybe I can help you
Thanks

@drcut
Copy link
Contributor Author

drcut commented Jun 19, 2020

I used the command "python tools/pytorch2onnx.py --config ./configs/retinanet/retinanet_r50_fpn_1x_coco.py --checkpoint ./retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth --output_file ./6n.onnx", but the error was "Runtime" Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type numpy.ndarray”, I now urgently need a problem-free retinanet onnx model test, because tomorrow I report to the teacher, so I hope you can provide it, thank you @drcut

OK, just for this emergency case :)
Would you please provide me the input shape you need?

@drcut
Copy link
Contributor Author

drcut commented Jun 19, 2020

My email is "manhongnie@gmail.com"

I just tried to convert an ONNX model. There is some numerical difference between Pytorch and ONNX, maybe this is due to the dummy input. Anyway, here is the link:
https://pan.baidu.com/s/1GP-si3oDdTndoC82tZgQlA (password: dbju)
Hope this can help you

@drcut
Copy link
Contributor Author

drcut commented Jun 19, 2020

Okay, thank you, in fact, my input size does not matter, I just used it for testing.
@drcut

OK, Good luck with your tomorrow's meeting!

@hellock
Copy link
Member

hellock commented Jun 19, 2020

Have you tested mmdection to tensorrt, I have not succeeded

You need to implement it by yourself. It is not supported yet.

@drcut
Copy link
Contributor Author

drcut commented Jun 20, 2020

ok, but why does it even run onnxruntime incorrectly, is there any onnx node that does not support it?
@drcut

First, I use a pretrain pytorch model for converting, however, this pth file is not totally compatible with the master code, which means some op does not have its correspond pretrain value, so the original pytorch result is also some kinds low. Besides, although I am not sure, I believe there is some numerical difference between Pytorch and Onnxruntime, I will figure it out next week. Finally, why you said Resize should be converted by Upsample? Resize is a standard ONNX op which can be executed by ONNX runtime directly.

@drcut
Copy link
Contributor Author

drcut commented Jun 20, 2020

Because that place is Usample according to the structure, but it is recognized as Resize in opset11, but its output is still problematic, because it needs to be replaced and rewritten, and there is topK. I said that the result of using onnxruntime test is wrong, because the result is not recognized at all, if it is just a small error, this does not affect the result, then it is right
nvidia engineers use this piece of code

"Import torch.onnx.symbolic_opset10 as onnx_symbolic
def upsample_nearest2d(g, input, output_size, *args):

Currently, TRT 5.1/6.0/7.0 ONNX Parser does not support all ONNX ops

needed to support dynamic upsampling ONNX forumlation

Here we hardcode scale=2 as a temporary workaround

scales = g.op("Constant", value_t=torch.tensor([1., 1., 2., 2.]))
return g.op("Resize", input, scales, mode_s="nearest")

onnx_symbolic.upsample_nearest2d = upsample_nearest2d"

To replace Upsample.
@drcut

I have only tested the onnx on cpu using onnxruntime. Maybe it does not work for GPU. In fact, we have been working on another part that supports the converting from onnx to trt, but it has not been published yet. I'm afraid you should implement the correspond part.

@drcut
Copy link
Contributor Author

drcut commented Jun 20, 2020

Can you give me your onnxruntime test code? Take a look at my test results, do I need this test result? As for the trt part, I also wrote the decode and nms parts, but I am currently distressed without the correct onnx.
@drcut

Please see the tools/pytorch2onnx.py, the onnx runtime code is executed while using --verify

@drcut
Copy link
Contributor Author

drcut commented Jun 20, 2020

Can you tell me about the environment in which you are running? I can't run
@drcut

Pytorch1.3 Python3.7.5 . Please make sure you use the correct branch

@drcut
Copy link
Contributor Author

drcut commented Jun 22, 2020

Can you do me a favor? Tell me how to add a new node Sigmiod on an onnx model, or what is the output of a certain node in the middle of viewing onnx?
@drcut

Sorry, I do not know how to directly modify onnx model. Onnx model is just a temporary file used to convert pytorch to other backend engine.

@drcut
Copy link
Contributor Author

drcut commented Jun 22, 2020

So how do you deal with ops that are not supported by tensorrt, such as "NonZero", "GatherND", "NonMaxSuppression", can these be made public? About your tensorrt implementation of retinanet
@drcut

Maybe in the future :). We implement TensorRT plugin for customized op

@manhongnie
Copy link

After testing your onnx is wrong, you can simplify the input and output
@drcut

@drcut
Copy link
Contributor Author

drcut commented Jun 22, 2020

After testing your onnx is wrong, you can simplify the input and output
@drcut

Would you please describe why it's incorrect?

1 similar comment
@drcut
Copy link
Contributor Author

drcut commented Jun 22, 2020

After testing your onnx is wrong, you can simplify the input and output
@drcut

Would you please describe why it's incorrect?

@manhongnie
Copy link

On topk and usample, there is no operation, and its output is wrong
@drcut

@manhongnie
Copy link

I check that onnx supports these operations, but the onnx you transferred does not have this operation. You can see if your 1.3 pytorch supports this operation
@drcut

@@ -93,6 +94,10 @@ def simple_test(self, img, img_metas, rescale=False):
outs = self.bbox_head(x)
bbox_list = self.bbox_head.get_bboxes(
*outs, img_metas, rescale=rescale)
# return in advance when export to ONNX
Copy link
Collaborator

@xvjiarui xvjiarui Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# return in advance when export to ONNX
# skip post-processing when exporting to ONNX

@@ -97,8 +97,9 @@ def bbox2result(bboxes, labels, num_classes):
if bboxes.shape[0] == 0:
return [np.zeros((0, 5), dtype=np.float32) for i in range(num_classes)]
else:
bboxes = bboxes.cpu().numpy()
labels = labels.cpu().numpy()
if isinstance(bboxes, torch.Tensor):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If so, we need to update the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMdet, after executing bbox2result, will return a np.ndarray value, which is not support by ONNX (ONNX can not trace np op, but only tensor op), so we can only convert the previous part into ONNX. So if we want to compare the result between Pytorch and ONNX, we have to use bbox2result to convert the output of ONNX. So this time, the input of bbox2result is np.ndarray (the ONNXruntime's output type)

Copy link
Collaborator

@xvjiarui xvjiarui Jul 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may keep this part unchanged and add use [bboxes[labels == i, :] for i in range(num_classes)] in pytorch2onnx()

@xvjiarui
Copy link
Collaborator

xvjiarui commented Jun 29, 2020

We may also need to update doc here.
We could list supported methods in the doc.

Copy link
Contributor Author

@drcut drcut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have submitted a new version according to the reviewer which mainly modify the ga_rpn_head's code of calling nms.

@@ -97,8 +97,9 @@ def bbox2result(bboxes, labels, num_classes):
if bboxes.shape[0] == 0:
return [np.zeros((0, 5), dtype=np.float32) for i in range(num_classes)]
else:
bboxes = bboxes.cpu().numpy()
labels = labels.cpu().numpy()
if isinstance(bboxes, torch.Tensor):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMdet, after executing bbox2result, will return a np.ndarray value, which is not support by ONNX (ONNX can not trace np op, but only tensor op), so we can only convert the previous part into ONNX. So if we want to compare the result between Pytorch and ONNX, we have to use bbox2result to convert the output of ONNX. So this time, the input of bbox2result is np.ndarray (the ONNXruntime's output type)

mmdet/core/post_processing/bbox_nms.py Show resolved Hide resolved
mmdet/core/anchor/anchor_generator.py Show resolved Hide resolved
parser.add_argument(
'--out', type=str, required=True, help='output ONNX filename')
'--verify', action='store_true', help='verify the onnx model')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'--verify', action='store_true', help='verify the onnx model')
'--verify', action='store_true', help='verify the onnx model output against pytorch output')

parser.add_argument('config', help='test config file path')
parser.add_argument('checkpoint', help='checkpoint file')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For object detection, we may make checkpoint a required argument. Without checkpoint, some branches may not be covered.

one_img = mmcv.imread(input_img, 'color')
one_img = mmcv.imresize(one_img, input_shape[2:]).transpose(2, 0, 1)
# normalize the input images
one_img = torch.from_numpy((one_img - 128) / 256).unsqueeze(0).float()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why normalization is fixed to 128?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making image norm a user input. The default could be imagenet mean/std.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making image norm a user input. The default could be imagenet mean/std.

After having some test, I decide to remove the normalized part. As without this step, we can still gain correct RetinaNet with default picture. Besides, the MMDet will raise an Error while we do not execute NMS while ONNX tracing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove img_norm may get in incorrect results for other images.
I suggest making image norm a user input. The default could be imagenet mean/std.

@drcut
Copy link
Contributor Author

drcut commented Jul 24, 2020

好的,还有一个是关于onnx的问题,我想请问一下您,我有一个onnx,但是里面topk节点出问题了,现在我需要修改这个节点,但是在怎么定义这个节点时,我有不会了。
“new_scale_node1 = onnx.helper.make_node(
"TopK",
inputs=['1006', '1009'],

outputs=['1010', '1011'],
#value=onnx.helper.make_tensor('value', onnx.TensorProto.FLOAT, [4], [1, 256, 160, 160])
#values, indices = topk_sorted_implementation("1006", "1009", axis, largest)

)”
你能指导一下嘛?

Sorry, I don't know.

@drcut
Copy link
Contributor Author

drcut commented Jul 24, 2020

@drcut
I just built mmcv-full from source code, but I got an error
"RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type numpy.ndarray"
My order is
"Python tools/pytorch2onnx.py configs/retinanet/retinanet_r50_fpn_1x_coco.py ./retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth --output_file ./6nm.onnx"
Although you may find me very annoying, I hope you can help me solve this problem, otherwise I will keep coming. Although my teacher may have given up on me, I have not.

All right, I will give you a full command list, please give me some time.

@drcut
Copy link
Contributor Author

drcut commented Jul 27, 2020

@drcut
Then you are done, please inform me

Hi, I have tried to use the following commands to convert RetinaNet, I'm quite sure these commands work:
Step1: build Pytorch1.3 (Pytorch version is very important!!!)
Step2: download mmcv
git clone https://github.com/open-mmlab/mmcv.git
and build it from source
MMCV_WITH_OPS=1 pip install --user -e .
Step3: Download the corresponding mmdet
git clone https://github.com/open-mmlab/mmdetection.git onnx_mmdet
cd onnx_mmdet
checkout to the correct branch
git fetch origin pull/3075/head:pull_3075
git checkout pull_3075
Step4: Build mmdet
pip install --user -e .
Step5: Build onnxruntime
pip install --user onnxruntime
Step6: Convert RetinaNet
python -u tools/pytorch2onnx.py configs/retinanet/retinanet_r50_fpn_1x_coco.py retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth --verify --output_file 6nm.onnx

@drcut
Copy link
Contributor Author

drcut commented Jul 27, 2020

@drcut
Do you mean that if I use pytorch1.5 it will not succeed?
OK I will test it now and follow your steps, but I still hope you can update to pytorch1.5

Yeah, because the ONNX symbolic between Pytorch1.3 and Pytorch1.5 are different. However, I did not find the bug your report when I use Pytorch1.5.

@drcut
Copy link
Contributor Author

drcut commented Jul 27, 2020

@drcut
Then why is there a problem when I use version 1.5, could you send us your installation environment?

I think it's because you used the incorrect branch. You should pull my PR and checkout to it.
I just use a simple environment with Python 3.6.

@drcut
Copy link
Contributor Author

drcut commented Jul 27, 2020

@drcut
I am using
"Git clone https://github.com/drcut/mmdetection.git -b pytorch2onnx" Isn't this the right branch, or is there something wrong with my command?

This is the correct branch. I have no idea about your bug as I can not reproduce it.

@@ -182,6 +182,9 @@ def forward(self, inputs):
**self.upsample_cfg)
else:
prev_shape = laterals[i - 1].shape[2:]
# convert prev_shape from torch.Size to tuple
# so that we can convert F.interpolate into ONNX
prev_shape = tuple(int(e) for e in prev_shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_shape = tuple(laterals[i - 1].shape[2:])

import torch
from mmcv.ops import RoIAlign, RoIPool
from mmcv.onnx.symbolic import register_extra_symbolics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can raise an error message if mmcv version is low.

@hellock hellock merged commit 2f32a47 into open-mmlab:master Aug 4, 2020
@breAchyz
Copy link

breAchyz commented Aug 6, 2020

I meet a problem when I convert retinanet to onnx, could you help me?
I trained the retinanet using my own VOC dataset with 'retina_r50_fpn.py' and failed to convert it to onnx using code in pytorch2onnx.py. The error information as follows:

Traceback (most recent call last):
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/pydevd.py", line 1585, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/convert2onnx.py", line 160, in
normalize_cfg=normalize_cfg)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/convert2onnx.py", line 58, in pytorch2onnx
opset_version=opset_version)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/init.py", line 168, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 69, in export
use_external_data_format=use_external_data_format)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 488, in _export
fixed_batch_size=fixed_batch_size)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 334, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 291, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 278, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 361, in forward
self._force_outplace,
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 348, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 548, in call
result = self._slow_forward(*input, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 534, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/base.py", line 173, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/base.py", line 153, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/single_stage.py", line 112, in simple_test
*outs, img_metas, rescale=rescale)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 574, in get_bboxes
scale_factor, cfg, rescale)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 652, in _get_bboxes_single
cfg.max_per_img)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/post_processing/bbox_nms.py", line 59, in multiclass_nms
raise RuntimeError('[ONNX Error] Can not record NMS '
RuntimeError: [ONNX Error] Can not record NMS as it has not been executed this time

My environment is pytorch1.5 mmdet2.3.0 mmcv1.0.5.

@drcut
Copy link
Contributor Author

drcut commented Aug 6, 2020

I meet a problem when I convert retinanet to onnx, could you help me?
I trained the retinanet using my own VOC dataset with 'retina_r50_fpn.py' and failed to convert it to onnx using code in pytorch2onnx.py. The error information as follows:

Traceback (most recent call last):
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/pydevd.py", line 1585, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/convert2onnx.py", line 160, in
normalize_cfg=normalize_cfg)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/convert2onnx.py", line 58, in pytorch2onnx
opset_version=opset_version)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/init.py", line 168, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 69, in export
use_external_data_format=use_external_data_format)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 488, in _export
fixed_batch_size=fixed_batch_size)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 334, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 291, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 278, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 361, in forward
self._force_outplace,
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 348, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 548, in call
result = self._slow_forward(*input, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 534, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/base.py", line 173, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/base.py", line 153, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/single_stage.py", line 112, in simple_test
*outs, img_metas, rescale=rescale)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 574, in get_bboxes
scale_factor, cfg, rescale)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 652, in _get_bboxes_single
cfg.max_per_img)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/post_processing/bbox_nms.py", line 59, in multiclass_nms
raise RuntimeError('[ONNX Error] Can not record NMS '
RuntimeError: [ONNX Error] Can not record NMS as it has not been executed this time

My environment is pytorch1.5 mmdet2.3.0 mmcv1.0.5.

Hi.
As Pytorch using tracing to convert a model into ONNX, it can only record the operations which are executed. However, from your description, I think the model may not generate any legal bboxes, so the program will not execute NMS. So the converting result will be wrong.
So please use some input data which can generate legal bbox. You can also use some preprocess to modify the data.
Besides, I have not test on Pytorch1.5. If anything goes wrong, maybe Pytorch1.3 can help.
Best

@breAchyz
Copy link

breAchyz commented Aug 6, 2020

I meet a problem when I convert retinanet to onnx, could you help me?
I trained the retinanet using my own VOC dataset with 'retina_r50_fpn.py' and failed to convert it to onnx using code in pytorch2onnx.py. The error information as follows:

Traceback (most recent call last):
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/pydevd.py", line 1585, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/ding/pycharm-community-2017.1.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/convert2onnx.py", line 160, in
normalize_cfg=normalize_cfg)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/convert2onnx.py", line 58, in pytorch2onnx
opset_version=opset_version)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/init.py", line 168, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 69, in export
use_external_data_format=use_external_data_format)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 488, in _export
fixed_batch_size=fixed_batch_size)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 334, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/onnx/utils.py", line 291, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 278, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 361, in forward
self._force_outplace,
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/jit/init.py", line 348, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 548, in call
result = self._slow_forward(*input, **kwargs)
File "/home/ding/miniconda3/envs/yz-mmdet2.3-tensorrt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 534, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/base.py", line 173, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/base.py", line 153, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/detectors/single_stage.py", line 112, in simple_test
*outs, img_metas, rescale=rescale)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 574, in get_bboxes
scale_factor, cfg, rescale)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 652, in _get_bboxes_single
cfg.max_per_img)
File "/home/ding/yz/Image_Recognize/mm_RetinaNet/mmdetection/mmdet/core/post_processing/bbox_nms.py", line 59, in multiclass_nms
raise RuntimeError('[ONNX Error] Can not record NMS '
RuntimeError: [ONNX Error] Can not record NMS as it has not been executed this time

My environment is pytorch1.5 mmdet2.3.0 mmcv1.0.5.

Hi.
As Pytorch using tracing to convert a model into ONNX, it can only record the operations which are executed. However, from your description, I think the model may not generate any legal bboxes, so the program will not execute NMS. So the converting result will be wrong.
So please use some input data which can generate legal bbox. You can also use some preprocess to modify the data.
Besides, I have not test on Pytorch1.5. If anything goes wrong, maybe Pytorch1.3 can help.
Best

Thanks for your rapid reply.
It works for this error and the model is successfully converted to onnx format.

@yhl41001
Copy link

/localdev/anaconda3/envs/mmdet01/lib/python3.7/site-packages/torch/onnx/symbolic_registry.py", line 91, in get_registered_op
return _registry[(domain, version)][opname]
KeyError: 'new_zeros'
转换的过程中会有这个错误,可以 解决吗?

@tianwen0110
Copy link

/localdev/anaconda3/envs/mmdet01/lib/python3.7/site-packages/torch/onnx/symbolic_registry.py", line 91, in get_registered_op
return _registry[(domain, version)][opname]
KeyError: 'new_zeros'
转换的过程中会有这个错误,可以 解决吗?

I have met this problem, it caused by the version of pytorch. Update the pytorch version could solve this problem. My version is pytorch1.6

@yhl41001
Copy link

@tianwen0110 好的,谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants