Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to ONNX Error: [TypeInferenceError] Mismatched tensor element type: source=9 target=2 #4907

Closed
Byte247 opened this issue Apr 11, 2023 · 1 comment

Comments

@Byte247
Copy link

Byte247 commented Apr 11, 2023

Instructions To Reproduce the Issue:

  1. Full runnable code or full changes you made:
    The base export_model under tools/deploy. The only thing I changed was line 166 to: "aug = T.ResizeShortestEdge(
    [1344, 1344], 1344
    )" as provided in https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/python/detectron2

  2. What exact command you run:
    python3 /home/tom/ws/utils/tensor_rt/detectron2/tools/deploy/export_model.py --config-file /home/tom/ws/utils/tensor_rt/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --output /home/tom/ws/vision/deploy/mask_rcnn --format onnx --sample-image /home/tom/ws/vision/deploy/image_1344.png --export-method tracing MODEL.DEVICE cuda MODEL.WEIGHTS /home/tom/Downloads/model_final_f10217.pkl

  3. Full logs or other relevant observations:

Command line arguments: Namespace(config_file='/home/tom/ws/utils/tensor_rt/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', export_method='tracing', format='onnx', opts=['MODEL.DEVICE', 'cuda', 'MODEL.WEIGHTS', '/home/tom/Downloads/model_final_f10217.pkl'], output='/home/tom/ws/vision/deploy/mask_rcnn', run_eval=False, sample_image='/home/tom/ws/vision/deploy/pen_1344.png')
[W init.cpp:833] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 1) (function operator())
[04/11 11:34:43 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /home/tom/Downloads/model_final_f10217.pkl ...
inference: <function export_tracing..inference at 0x7fa2f9431e50>
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/image_list.py:85: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert t.shape[:-2] == tensors[0].shape[:-2], t.shape
/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/modeling/proposal_generator/proposal_utils.py:106: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not valid_mask.all():
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!"
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
h, w = box_size
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/layers/nms.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert boxes.shape[-1] == 4
/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torch/init.py:1209: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert condition, message
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/layers/roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert rois.dim() == 2 and rois.size(1) == 5
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/modeling/roi_heads/fast_rcnn.py:138: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not valid_mask.all():
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!"
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
h, w = box_size
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/modeling/roi_heads/fast_rcnn.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if num_bbox_reg_classes == 1:
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/layers/nms.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert boxes.shape[-1] == 4
/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torch/init.py:1209: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert condition, message
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/layers/roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert rois.dim() == 2 and rois.size(1) == 5
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/modeling/roi_heads/mask_head.py:139: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if cls_agnostic_mask:
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!"
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
h, w = box_size
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/tom/ws/utils/tensor_rt/detectron2/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:5589: UserWarning: Exporting aten::index operator of advanced indexing in opset 16 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
warnings.warn(
/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torch/onnx/utils.py:1636: UserWarning: The exported ONNX model failed ONNX shape inference.The model will not be executable by the ONNX Runtime.If this is unintended and you believe there is a bug,please report an issue at https://github.com/pytorch/pytorch/issues.Error reported by strict ONNX shape inference: [ShapeInferenceError] Shape inference error(s): (op_type:If, node name: If_2550): [TypeInferenceError] Mismatched tensor element type: source=9 target=2
(Triggered internally at ../torch/csrc/jit/serialization/export.cpp:1407.)
_C._check_onnx_proto(proto)
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

[04/11 11:34:48 detectron2]: Inputs schema: TupleSchema(schemas=[ListSchema(schemas=[DictSchema(schemas=[IdentitySchema()], sizes=[1], keys=['image'])], sizes=[1])], sizes=[1])
[04/11 11:34:48 detectron2]: Outputs schema: ListSchema(schemas=[DictSchema(schemas=[DictSchema(schemas=[InstancesSchema(schemas=[TensorWrapSchema(class_name='detectron2.structures.Boxes'), IdentitySchema(), IdentitySchema(), IdentitySchema()], sizes=[1, 1, 1, 1], keys=['pred_boxes', 'pred_classes', 'pred_masks', 'scores'])], sizes=[5], keys=['instances'])], sizes=[5], keys=['instances'])], sizes=[5])
None
[04/11 11:34:48 detectron2]: Success.

Expected behavior:

.onnx file that does not return the "ShapeInferenceError"

Environment:


sys.platform linux
Python 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0]
numpy 1.23.2
detectron2 0.6 @/home/tom/ws/utils/tensor_rt/detectron2/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.7
detectron2 arch flags 8.6
DETECTRON2_ENV_MODULE
PyTorch 2.0.0+cu118 @/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 3060 Laptop GPU (arch=8.6)
Driver version 520.61.05
CUDA_HOME /usr/local/cuda-11.8
Pillow 9.3.0
torchvision 0.15.1+cu118 @/home/tom/ws/utils/tensor_rt/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.7.0


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201703
  • Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.8
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  • CuDNN 8.8.1
    • Built with CuDNN 8.7
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
@Byte247
Copy link
Author

Byte247 commented Apr 11, 2023

I dont know how, but changing to a different STABLE_ONNX_OPSET_VERSION and back to 16 solved it

@Byte247 Byte247 closed this as completed Apr 11, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant