Skip to content

MIssing torchvision::nms error in the C++ CUDA TorchVision API #5697

@montmejat

Description

@montmejat

🐛 Describe the bug

I'm unable to load a my trained MaskRCNN model (using the one from the torchvision Python module). I'm converting it to TorchScript using torch.jit.script, saving it as a .pt file and finally using the torch::jit::load from LibTorch:

torch::NoGradGuard no_grad;
model = torch::jit::load(model_path);

Nothing fancy, but I'm getting the following error:

terminate called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown builtin op: torchvision::nms.
Could not find any similar ops to torchvision::nms. This op may not exist or may not be currently supported in TorchScript.
:
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 40
        _log_api_usage_once(nms)
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 154
  _64 = __torch__.torchvision.extension._assert_has_ops
  _65 = _64()
  _66 = ops.torchvision.nms(boxes, scores, iou_threshold)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _66
'nms' is being compiled since it was called from '_batched_nms_vanilla'
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 108
    for class_id in torch.unique(idxs):
        curr_indices = torch.where(idxs == class_id)[0]
        curr_keep_indices = nms(boxes[curr_indices], scores[curr_indices], iou_threshold)
                            ~~~ <--- HERE
        keep_mask[curr_indices[curr_keep_indices]] = True
    keep_indices = torch.where(keep_mask)[0]
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 83
    _31 = torch.index(boxes, _30)
    _32 = annotate(List[Optional[Tensor]], [curr_indices])
    curr_keep_indices = __torch__.torchvision.ops.boxes.nms(_31, torch.index(scores, _32), iou_threshold, )
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _33 = annotate(List[Optional[Tensor]], [curr_keep_indices])
    _34 = torch.index(curr_indices, _33)
'_batched_nms_vanilla' is being compiled since it was called from 'batched_nms'
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 35
    idxs: Tensor,
    iou_threshold: float) -> Tensor:
  _9 = __torch__.torchvision.ops.boxes._batched_nms_vanilla
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _10 = __torch__.torchvision.ops.boxes._batched_nms_coordinate_trick
  _11 = torch.numel(boxes)
'batched_nms' is being compiled since it was called from 'RegionProposalNetwork.filter_proposals'
Serialized   File "code/__torch__/torchvision/models/detection/rpn.py", line 72
    _11 = __torch__.torchvision.ops.boxes.clip_boxes_to_image
    _12 = __torch__.torchvision.ops.boxes.remove_small_boxes
    _13 = __torch__.torchvision.ops.boxes.batched_nms
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    num_images = (torch.size(proposals))[0]
    device = ops.prim.device(proposals)
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 353
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
                        ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    
        losses = {}
Serialized   File "code/__torch__/torchvision/models/detection/rpn.py", line 43
    proposals0 = torch.view(proposals, [num_images, -1, 4])
    image_sizes = images.image_sizes
    _8 = (self).filter_proposals(proposals0, objectness0, image_sizes, num_anchors_per_level, )
                                                                       ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    boxes, scores, = _8
    losses = annotate(Dict[str, Tensor], {})

Aborted (core dumped)

I tried running this project and it's working on CPU but not on GPU, I get this output:

Loading model
Model loaded
[W faster_rcnn.py:107] Warning: RCNN always returns a (Losses, Detections) tuple in scripting (function )
ok
output({}, [{boxes: [ CPUFloatType{0,4} ], labels: [ CPULongType{0} ], scores: [ CPUFloatType{0} ]}, {boxes: [ CPUFloatType{0,4} ], labels: [ CPULongType{0} ], scores: [ CPUFloatType{0} ]}])
terminate called after throwing an instance of 'c10::NotImplementedError'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torchvision/models/detection/rpn.py", line 122, in forward
      lvl1 = torch.index(lvl0, _28)
      nms_thresh = self.nms_thresh
      keep1 = _13(boxes2, scores1, lvl1, nms_thresh, )
              ~~~ <--- HERE
      keep2 = torch.slice(keep1, 0, None, (self).post_nms_top_n())
      _29 = annotate(List[Optional[Tensor]], [keep2])
  File "code/__torch__/torchvision/ops/boxes.py", line 52, in batched_nms
    _16 = _17
  else:
    _18 = _10(boxes, scores, idxs, iou_threshold, )
          ~~~ <--- HERE
    _16 = _18
  return _16
  File "code/__torch__/torchvision/ops/boxes.py", line 109, in _batched_nms_coordinate_trick
    _47 = torch.unsqueeze(torch.slice(offsets), 1)
    boxes_for_nms = torch.add(boxes, _47)
    keep = __torch__.torchvision.ops.boxes.nms(boxes_for_nms, scores, iou_threshold, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _42 = keep
  return _42
  File "code/__torch__/torchvision/ops/boxes.py", line 154, in nms
  _64 = __torch__.torchvision.extension._assert_has_ops
  _65 = _64()
  _66 = ops.torchvision.nms(boxes, scores, iou_threshold)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _66

Traceback of TorchScript, original code (most recent call last):
  File "/home/aurelien/.local/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 266, in forward
    
            # non-maximum suppression, independently done per level
            keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
                   ~~~~~~~~~~~~~~~~~~~ <--- HERE
    
            # keep only topk scoring predictions
  File "/home/aurelien/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 74, in batched_nms
        return _batched_nms_vanilla(boxes, scores, idxs, iou_threshold)
    else:
        return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  File "/home/aurelien/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 40, in nms
        _log_api_usage_once(nms)
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Python, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, Tracer, AutocastCPU, Autocast, Batched, VmapMode, Functionalize].

CPU: registered at /home/aurelien/Downloads/vision/torchvision/csrc/ops/cpu/nms_kernel.cpp:112 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:47 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradLazy: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
AutogradXPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradMLC: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradHPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:293 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:461 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:305 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1059 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:52 [backend fallback]


Exception raised from reportError at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:434 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f21dcc040eb in /opt/libtorch/lib/libc10.so)
frame #1: c10::impl::OperatorEntry::reportError(c10::DispatchKey) const + 0xa48 (0x7f215c2e2d98 in /opt/libtorch/lib/libtorch_cpu.so)
frame #2: c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const + 0x44e (0x7f215e9dc1ae in /opt/libtorch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x3527b72 (0x7f215e641b72 in /opt/libtorch/lib/libtorch_cpu.so)
frame #4: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x52 (0x7f215e62fd02 in /opt/libtorch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x3508f4d (0x7f215e622f4d in /opt/libtorch/lib/libtorch_cpu.so)
frame #6: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const + 0x194 (0x7f215e2d2da4 in /opt/libtorch/lib/libtorch_cpu.so)
frame #7: torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0xc3 (0x555faadc7cb5 in ./test_frcnn_tracing)
frame #8: main + 0x592 (0x555faadc31b0 in ./test_frcnn_tracing)
frame #9: __libc_start_main + 0xf3 (0x7f21222460b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #10: _start + 0x2e (0x555faadc28ee in ./test_frcnn_tracing)

Aborted (core dumped)

I'm not sure what I'm missing! I've seen other people having this issue, but I have not been able to resolve it.

Versions

LibTorch version 1.11.0 downloaded from the site.
Tested with TorchVision 0.11 and 0.12 (built with cmake -DWITH_CUDA=on -DCMAKE_PREFIX_PATH=/opt/libtorch/share/cmake/Torch .. from their branch)
Pop-OS 20.04
CUDA 11.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions