Runtime error in torchvision nms in Linux. Windows works fine #1705

adizhol · 2019-12-29T10:55:53Z

Hi,

I'm getting an runtime error when running torchvision\ops\boxes.nms.
torchvision 0.4.0
pytorch 1.2.0 (GPU)

RuntimeError: Trying to create tensor with negative dimension -532064992: [-532064992] (check_size_nonnegative at /pytorch/aten/src/ATen/native/TensorFactories.h:64)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fd8e1d79273 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: at::native::empty_cuda(c10::ArrayRef, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0xb76 (0x7fd7d85e8fb6 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: + 0x3f7da58 (0x7fd7d6f7fa58 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: torch::autograd::VariableType::empty(c10::ArrayRef, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x3fa (0x7fd7d69fa75a in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #4: + 0x7f661 (0x7fd87b3fd661 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #5: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x430 (0x7fd87b3fe042 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #6: nms(at::Tensor const&, at::Tensor const&, float) + 0x172 (0x7fd87b3c1cf9 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x65115 (0x7fd87b3e3115 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x62304 (0x7fd87b3e0304 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #9: + 0x5dc45 (0x7fd87b3dbc45 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0x5ded2 (0x7fd87b3dbed2 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)
frame #11: + 0x4f2e7 (0x7fd87b3cd2e7 in /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so)

To Reproduce
code:

nms(transformed_anchors, scores, iou_threshold = 0.7)
transformed_anchors is [ 490698, 4]
scores is [ 490698]

transformed_anchors .min () = 0
transformed_anchors .max () = 2560

Environment

[pip] msgpack-numpy==0.4.3.2
[pip] numpy==1.16.4
[pip] torch==1.2.0
[pip] torchtext==0.4.0
[pip] torchvision==0.4.0
[conda] magma-cuda100 2.1.0 5 local
[conda] mkl 2019.1 144
[conda] mkl-include 2019.1 144
[conda] torch 1.2.0 pypi_0 pypi
[conda] torchtext 0.4.0 pypi_0 pypi
[conda] torchvision 0.4.0 pypi_0 pypi

Additional context
same code on Windows runs as expected

same issue opened in pytorch/pytorch

The text was updated successfully, but these errors were encountered:

fmassa · 2020-01-03T18:06:56Z

Thanks for the bug report!

I believe the issue is because we use int in

vision/torchvision/csrc/cuda/nms_cuda.cu

Lines 82 to 84 in 07cbb46

    
           int dets_num = dets.size(0); 
        
           const int col_blocks = at::cuda::ATenCeilDiv(dets_num, threadsPerBlock);

instead of long. Although I'm not sure if the current strategy will work for such large tensors, as it might require a bit too much memory.

EDIT: I tried a quick fix by replacing int with int64_t, but I get CUDA out of memory errors. So for this use-case, the current implementation is not enough and a new implementation might be required.

shuangshuangguo · 2020-04-22T08:07:14Z

Hi, @adizhol @fmassa Did you solve the problem?

senarvi · 2021-08-04T16:03:07Z

Even if the implementation doesn't scale up to larger sizes, it would be great to get a better error message.

fmassa · 2021-08-09T12:53:21Z

Hi @senarvi,

I agree that this error message is not great. We should at least make the error message more meaningful by replacing the int with int64_t.

abhiagwl4262 · 2021-11-13T10:44:24Z

@adizhol

The range of int32 is -2147483648 to 2147483647. Then why did the error comes in your case with 490698 boxes?

NMS implementation has issues when the no. of proposals are large. pytorch/vision#1705

fmassa added bug module: ops labels Jan 3, 2020

fmassa added the help wanted label Mar 20, 2020

gaussiangit mentioned this issue May 4, 2020

RuntimeError: Trying to create tensor with negative dimension zylo117/Yet-Another-EfficientDet-Pytorch#225

Open

JiaLim98 mentioned this issue Jul 27, 2020

RuntimeError: Trying to create tensor with negative dimension zylo117/Yet-Another-EfficientDet-Pytorch#452

Closed

This was referenced Dec 22, 2020

what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] ultralytics/yolov5#1688

Closed

Boxes with negative scores in NMS input? #3198

Closed

fmassa added the high priority label Aug 9, 2021

pytorch-probot bot added the triage review label Aug 9, 2021

sidharth5n added a commit to sidharth5n/Neural-CTRLF that referenced this issue Jan 25, 2022

Changed NMS to cpu

8de49c5

NMS implementation has issues when the no. of proposals are large. pytorch/vision#1705

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime error in torchvision nms in Linux. Windows works fine #1705

Runtime error in torchvision nms in Linux. Windows works fine #1705

adizhol commented Dec 29, 2019 •

edited by fmassa

Loading

fmassa commented Jan 3, 2020 •

edited

Loading

shuangshuangguo commented Apr 22, 2020

senarvi commented Aug 4, 2021

fmassa commented Aug 9, 2021

abhiagwl4262 commented Nov 13, 2021 •

edited

Loading

Runtime error in torchvision nms in Linux. Windows works fine #1705

Runtime error in torchvision nms in Linux. Windows works fine #1705

Comments

adizhol commented Dec 29, 2019 • edited by fmassa Loading

fmassa commented Jan 3, 2020 • edited Loading

shuangshuangguo commented Apr 22, 2020

senarvi commented Aug 4, 2021

fmassa commented Aug 9, 2021

abhiagwl4262 commented Nov 13, 2021 • edited Loading

adizhol commented Dec 29, 2019 •

edited by fmassa

Loading

fmassa commented Jan 3, 2020 •

edited

Loading

abhiagwl4262 commented Nov 13, 2021 •

edited

Loading