Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to convert torch.jit.ScriptModule to ONNX (crash) #30512

Closed
lironmo opened this issue Nov 27, 2019 · 12 comments
Closed

failed to convert torch.jit.ScriptModule to ONNX (crash) #30512

lironmo opened this issue Nov 27, 2019 · 12 comments
Assignees

Comments

@lironmo
Copy link

@lironmo lironmo commented Nov 27, 2019

馃悰 Bug

when convert scriptModule to onnx we crash and get the following exception:

Traceback (most recent call last):
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 382, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 262, in _model_to_graph
    fixed_batch_size=fixed_batch_size)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 132, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/__init__.py", line 174, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 619, in _run_symbolic_function
    return op_fn(g, *inputs, **attrs)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py", line 124, in wrapper
    return fn(g, *args)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 862, in batch_norm
    if len(input_sizes) == 2:
TypeError: object of type 'NoneType' has no len() (occurred when translating batch_norm)

To Reproduce

load the attached torch script, and try to convert to onnx:

def convert(self):
    loaded = torch.jit.load(self._torch_script_path)
    #loaded.load_state_dict(self._model_state)
    dummy_input = torch.randn(1, 3, 224, 224)
    target = loaded(dummy_input)
    torch.onnx.export(loaded, dummy_input, self._out_onnx_path, verbose=True,
    operator_export_type=torch.onnx.OperatorExportTypes.ONNX,
    example_outputs=target)

cc @suo @houseroad @spandantiwari @lara-hdr @BowenBao @neginraoof

@lironmo

This comment has been minimized.

Copy link
Author

@lironmo lironmo commented Nov 27, 2019

file: https://filebin.net/scu91052e8txtl4r

clean code
import torch
loaded = torch.jit.load("/app_data/test_torch_script/torch_script_test.zip")
dummy_input = torch.randn(1, 3, 224, 224)
target = loaded(dummy_input)
torch.onnx.export(loaded, dummy_input, self._out_onnx_path, verbose=True,
operator_export_type=torch.onnx.OperatorExportTypes.ONNX,
example_outputs=target)

xuhdev added a commit to xuhdev/pytorch-xla that referenced this issue Nov 28, 2019
@lironmo lironmo changed the title failed to convert torch.jit.ScriptModule to ONNX failed to convert torch.jit.ScriptModule to ONNX (crash) Nov 28, 2019
@lironmo

This comment has been minimized.

Copy link
Author

@lironmo lironmo commented Nov 28, 2019

when i compare it to normal export(create the model from code and export) i get the following:

input
133 defined in (%133 : Float(1, 64, 112, 112) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[7, 7], pads=[3, 3, 3, 3], strides=[2, 2]](%input.1, %conv1.weight), scope: ResNet/Conv2d[conv1] # /home/liron/envs/detectron/lib/python3.6/site-packages/torch/nn/modules/conv.py:342:0
)

input.type().sizes()
[1, 64, 112, 112]

where i try to export from torch module:

input
114 defined in (%114 : Tensor = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[7, 7], pads=[3, 3, 3, 3], strides=[2, 2]](%input.1, %102) # code/torch/torch/nn/modules/container.py:213:13
)

and

input_sizes = input.type().sizes()
type(input_sizes)
<class 'NoneType'>

dlibenzi added a commit to pytorch/xla that referenced this issue Nov 28, 2019
@lironmo

This comment has been minimized.

Copy link
Author

@lironmo lironmo commented Nov 28, 2019

@dlibenzi - can you explain why your commit related to this issue?

@lironmo

This comment has been minimized.

@dlibenzi

This comment has been minimized.

Copy link
Contributor

@dlibenzi dlibenzi commented Dec 3, 2019

@dlibenzi - can you explain why your commit related to this issue?

Should I know? 馃槃

@suo

This comment has been minimized.

Copy link
Member

@suo suo commented Dec 3, 2019

@houseroad who is the right person to look at this?

@lara-hdr

This comment has been minimized.

Copy link
Member

@lara-hdr lara-hdr commented Dec 3, 2019

@lironmo what version of PyTorch are you using?

@lironmo

This comment has been minimized.

Copy link
Author

@lironmo lironmo commented Dec 4, 2019

@lara-hdr 1.3.0 and test also at 1.3.1

@lara-hdr

This comment has been minimized.

Copy link
Member

@lara-hdr lara-hdr commented Dec 4, 2019

@lironmo , the issue is that the shape information of the tensors are not always available when scripting. The ONNX exporter needs these information in certain cases when PyTorch and ONNX operators' behaviors don't align perfectly.

Batch_norm was recently updated to export without the shape information in this PR #29458, so the error you are getting with batch_norm is now fixed in master.

However when I tried exporting your model with PyTorch master I got a similar error with flatten.
I submitted a PR with some improvements for flatten in opset 11, that solves the problem in your case #30751.

Once this PR is merged you should be able to export your model in opset_version=11 (use the parameter opset_version=11 in the exporter api) with PyTorch nighly.

@lironmo

This comment has been minimized.

Copy link
Author

@lironmo lironmo commented Dec 5, 2019

@lara-hdr - thanks for your replay :),
I got the same error at the night build at batch norm layer,
I used:
(convert) liron@liron-Latitude-5490:~/work/pixoneye/model_conversion$ pip freeze | grep -i torch
torch-nightly==1.2.0.dev20190805
torchvision==0.4.1

so i need to wait to next night build? i created the traced torch script module with the new night build - updated in the bin.

@lironmo

This comment has been minimized.

Copy link
Author

@lironmo lironmo commented Dec 5, 2019

@lara-hdr thanks for your reply :)

I also tried to install the night build and create a new traced model and convert it to onnx, but i get the same problem with the batch norm layer (see below for trace)

(convert) liron@liron-Latitude-5490:~/work/pixoneye/model_conversion$ pip freeze | grep -i torch
torch-nightly==1.2.0.dev20190805
torchvision==0.4.1

so i need to wait to the next night build?

i uploaded the new traced model to the bin (with night_build suffix, https://filebin.net/scu91052e8txtl4r)

about the flatten, i will wait for the fix.

trace:
Traceback (most recent call last):
File "/home/liron/work/pixoneye/model_conversion/test.py", line 17, in
out_onnx_path=out_onnx_script, transform_yaml_path=transform_yaml)
File "/home/liron/work/pixoneye/model_conversion/edgify/convert_model.py", line 28, in convert_from_torch_script
cls._load_model_weights_and_export(model, model_state_dict, out_onnx_path, transform_yaml_path)
File "/home/liron/work/pixoneye/model_conversion/edgify/convert_model.py", line 76, in _load_model_weights_and_export
example_outputs=target)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/init.py", line 132, in export
strip_doc_string, dynamic_axes)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 64, in export
example_outputs=example_outputs, strip_doc_string=strip_doc_string, dynamic_axes=dynamic_axes)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 329, in _export
_retain_param_name, do_constant_folding)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 225, in _model_to_graph
_disable_torch_constant_prop=_disable_torch_constant_prop)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 127, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/init.py", line 163, in _run_symbolic_function
return utils._run_symbolic_function(*args, kwargs)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 564, in _run_symbolic_function
return op_fn(g, inputs, attrs)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py", line 146, in wrapper
return fn(g, args)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 876, in batch_norm
input_sizes = input.type().sizes()
RuntimeError: r INTERNAL ASSERT FAILED at /pytorch/aten/src/ATen/core/jit_type.h:155, please report a bug to PyTorch. (expect at /pytorch/aten/src/ATen/core/jit_type.h:155)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fb44d43e273 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: std::shared_ptrc10::CompleteTensorType c10::Type::expectc10::CompleteTensorType() + 0x1d3 (0x7fb4adc20413 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0x492412 (0x7fb4adc46412 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x1d5484 (0x7fb4ad989484 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: _PyCFunction_FastCallKeywords + 0x1eb (0x52393b in /home/liron/envs/convert/bin/python)
frame #5: /home/liron/envs/convert/bin/python() [0x57dc05]
frame #6: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #7: PyEval_EvalCodeEx + 0x285 (0x57dfd5 in /home/liron/envs/convert/bin/python)
frame #8: /home/liron/envs/convert/bin/python() [0x4fbb33]
frame #9: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #10: _PyEval_EvalFrameDefault + 0x1a7c (0x5778cc in /home/liron/envs/convert/bin/python)
frame #11: PyEval_EvalCodeEx + 0x5b2 (0x57e302 in /home/liron/envs/convert/bin/python)
frame #12: /home/liron/envs/convert/bin/python() [0x4fbc34]
frame #13: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x1a7c (0x5778cc in /home/liron/envs/convert/bin/python)
frame #15: PyEval_EvalCodeEx + 0x5b2 (0x57e302 in /home/liron/envs/convert/bin/python)
frame #16: /home/liron/envs/convert/bin/python() [0x4fbc34]
frame #17: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x1a7c (0x5778cc in /home/liron/envs/convert/bin/python)
frame #19: PyEval_EvalCodeEx + 0x285 (0x57dfd5 in /home/liron/envs/convert/bin/python)
frame #20: /home/liron/envs/convert/bin/python() [0x4fbb33]
frame #21: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #22: torch::jit::BlockToONNX(torch::jit::Block
, torch::jit::Block
, torch::onnx::OperatorExportTypes, std::unordered_map<torch::jit::Value
, torch::jit::Value
, std::hashtorch::jit::Value*, std::equal_totorch::jit::Value*, std::allocator<std::pair<torch::jit::Value
const, torch::jit::Value
> > >) + 0x4b2 (0x7fb4adc0ee52 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #23: torch::jit::ToONNX(std::shared_ptrtorch::jit::Graph&, torch::onnx::OperatorExportTypes) + 0x2c2 (0x7fb4adc106e2 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #24: + 0x4548d3 (0x7fb4adc088d3 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #25: + 0x1d5484 (0x7fb4ad989484 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #26: _PyCFunction_FastCallKeywords + 0x1eb (0x52393b in /home/liron/envs/convert/bin/python)
frame #27: /home/liron/envs/convert/bin/python() [0x57da79]
frame #28: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #29: /home/liron/envs/convert/bin/python() [0x57535f]
frame #30: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #31: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #32: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #33: /home/liron/envs/convert/bin/python() [0x57535f]
frame #34: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #35: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #36: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #37: /home/liron/envs/convert/bin/python() [0x575716]
frame #38: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #39: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #40: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #41: /home/liron/envs/convert/bin/python() [0x57535f]
frame #42: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #43: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #44: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #45: /home/liron/envs/convert/bin/python() [0x57535f]
frame #46: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #47: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #48: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #49: /home/liron/envs/convert/bin/python() [0x57eb2d]
frame #50: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #51: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #52: /home/liron/envs/convert/bin/python() [0x57535f]
frame #53: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #54: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #55: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #56: /home/liron/envs/convert/bin/python() [0x57535f]
frame #57: PyEval_EvalCode + 0x23 (0x5750d3 in /home/liron/envs/convert/bin/python)
frame #58: /home/liron/envs/convert/bin/python() [0x5ea0c2]
frame #59: PyRun_FileExFlags + 0x9a (0x5ea52a in /home/liron/envs/convert/bin/python)
frame #60: PyRun_SimpleFileExFlags + 0x1a7 (0x5ea2e7 in /home/liron/envs/convert/bin/python)
frame #61: Py_Main + 0x623 (0x5ef7f3 in /home/liron/envs/convert/bin/python)
frame #62: main + 0xe9 (0x4d1f09 in /home/liron/envs/convert/bin/python)
frame #63: __libc_start_main + 0xf0 (0x7fb4b1aea830 in /lib/x86_64-linux-gnu/libc.so.6)

@lara-hdr

This comment has been minimized.

Copy link
Member

@lara-hdr lara-hdr commented Dec 6, 2019

@lironmo, torch nightly should be version 1.4.0.dev not 1.2.0.
With he nightly you won't get the error with batch norm, but it will fail on flatten.
Once #30751 is merged you could get the latest nightly to export your model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can鈥檛 perform that action at this time.