RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120 #7680

ShreyasSkandan · 2018-05-18T17:44:12Z

Issue description

I'm trying to run a variant of ERFNet on an NVIDIA TX-2 running Jetpack 3.2 (CUDA 9.0 and CuDNN 7).

I get the following error:
RuntimeError: cuda runtime error (7) : too many resources requested for launch at ../../../pytorch/torch/lib/THCUNN/im2col.h:120

This error indicates the size of the model + overhead is too large for the GPU? But this is roughly a 700mb model trying to perform inference on a single 640x512 grayscale image on a GPU that has roughly 6.5Gb of free space. I even tried training a new model on images at half that resolution and get the same error.

Any tips/feedback is appreciated.

PyTorch or Caffe2: PyTorch
How you installed PyTorch (conda, pip, source): Source
Build command you used (if compiling from source): Followed jetson-reinforcement github
OS: Ubuntu 16.04 on Nvidia TX2
PyTorch version: 0.3.0
Python version: 3.6
CUDA/cuDNN version: CUDA 9.0 , CUDNN 7
GPU models and configuration: Tegra X2
GCC version (if compiling from source): 5.4
CMake version: 3.5.1
Versions of any other relevant libraries: https://github.com/Eromera/erfnet_pytorch

soumith · 2018-05-18T18:29:43Z

we should fix the launch parameters. I presume we cant use as many threads per block on the TX2 as we use on desktop GPUs.
@ngimel can you tell us what the limits of TX2 GPUs are for the fix to im2col

ngimel · 2018-05-18T19:40:31Z

Threads per block and maximum blocks in the grid are actually the same for TX2 as they are for desktop GPUs https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, only number of registers is smaller. @dusty-nv, do you know what might be causing this?

ShreyasSkandan · 2018-05-20T03:49:10Z

@soumith: as @ngimel said, the number of threads per block is constant across different NVIDIA GPUs, the Tegra series included.

Is it possible that it was compiled to require more registers than is available to the TX2, and maybe the kernel invocation of im2col requires some sort of launch_bounds() command?

soumith · 2018-05-20T03:59:49Z

@ShreyasSkandan TX2 is arm64 platform. I presume pytorch is compiled from source. In this case, there isn't a chance that it was compiled to requre more registers than available on TX2 I think. The build log will be helpful to see.

ShreyasSkandan · 2018-05-20T04:01:36Z

@soumith thanks for the quick response. I will try dig up the build log tomorrow and post it here.

ngimel · 2018-05-20T04:17:28Z

@soumith, if there are no launch bounds, it is in fact possible that a kernel is compiled to request more registers than available. At the compile time, compiler does not know how many threads you'll want to launch with, so, potentially, it can use too many registers per thread to later satisfy runtime requirements, and, e.g. trying to launch with 512 or 1024 threads could fail (not even a single block can be put on an SM), whereas launching with 256 would succeed.

soumith · 2018-05-21T17:38:55Z

@ngimel is there a way where we can audit that GPU constraints on our server setting (i.e. not actually sitting and compiling TX2 PyTorch)

ngimel · 2018-05-21T17:45:03Z

Adding launch_bounds with the max number of threads the kernel is going to be launched with will cause compiler not to overuse registers. We had to do it e.g. for interp kernels when cuda 9 started using more registers

pytorch/aten/src/THCUNN/VolumetricUpSamplingTrilinear.cu

Line 15 in 4caea64

__launch_bounds__(1024)

ShreyasSkandan · 2018-05-22T14:35:31Z

Ahh, that was as I suspected @ngimel Thank you for clearing this up.
@soumith @ngimel do you guys think a fix will be released in the near future (week?).

Thanks for all the help

ShreyasSkandan · 2018-05-24T02:46:45Z

Works now, thanks!

ababycat · 2018-05-26T03:39:53Z

@ngimel hello, I met the same error on tx2, complied pytorch 0.3.0.. But the error is this: 'cuda runtime error(7): too many resources requested for launch at ...../pytorch/torch/lib/THCUNN/generic/SpatialDilatedMaxPooling.cu'. The file is different with 'VolumetricUpSamplingTrilinear.cu', is that I need add this line 'launch_bounds(1024)' to every function in file 'SpatialDilatedMaxPooling.cu'?
Thank you!

andrewssobral · 2018-09-24T20:40:47Z

Hello @ngimel @soumith,

I am also facing a similar issue:

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/Downloads/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:66

Here's my setup:

How you installed PyTorch (conda, pip, source): Source
Build command you used (if compiling from source): Followed jetson-reinforcement github
OS: Ubuntu 16.04 on Nvidia TX2
PyTorch version: 0.5.0a0+a24163a (torchvision 0.2.1)
Python version: 3.5
CUDA/cuDNN version: CUDA 9.0 , CUDNN 7
GPU models and configuration: Tegra X2
GCC version (if compiling from source): 5.4
CMake version: 3.12.2

Source code:
https://github.com/andrewssobral/deep-learning-pytorch/blob/master/segmentation/train_binseg.py

Full log:

CUDA_ENABLED:  True
/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:225: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.functional.interpolate instead.")
/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
THCudaCheck FAIL file=/home/nvidia/Downloads/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu line=66 error=7 : too many resources requested forlaunch
Traceback (most recent call last):
  File "train_binseg.py", line 73, in <module>
    outputs = model(inputs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/Downloads/deep-learning-pytorch/segmentation/networks/SegNet.py", line 73, in forward
    enc5 = self.enc5(dec5)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/Downloads/deep-learning-pytorch/segmentation/networks/SegNet.py", line 33, in forward
    return self.encode(x)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py", line 226, in forward
    return super(UpsamplingBilinear2d, self).forward(input)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py", line 123, in forward
    return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
  File "/home/nvidia/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1985, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/Downloads/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:66
nvidia@tegra-ubuntu:~/Downloads/deep-learning-pytorch/segmentation$

Do you know what it could be?

andrewssobral · 2018-09-24T21:54:40Z

I know that 'SpatialUpSamplingBilinear.cu' without launch_bounds(1024) leads to this error, but I don't know how to fix it...

Tried this solution #8103 , but still not working (after compilation)

andrewssobral · 2018-09-25T13:49:59Z

My issue was solved by following #8103 (comment)

MrLinNing · 2018-12-27T12:35:39Z

@ngimel @ShreyasSkandan Hi, I have the same problem. Can this problem solved by gitting clone the new pytorch, than changing CUDA_NUM_THREADS =256 in the below two file and compiling it?

dusty-nv · 2019-07-25T15:51:38Z

@MrLinNing that appears to fix some of the functions, but not perhaps all - not sure. For more info, see:

#8103 (comment)
#8103 (comment)

zou3519 added the todo Not as important as medium or high priority tasks, but we will work on these. label May 21, 2018

ngimel mentioned this issue May 23, 2018

add launch bounds to im2col and col2im #7779

Merged

ShreyasSkandan mentioned this issue May 23, 2018

get stuck running erfnet model on Jetson TX2 Eromera/erfnet_pytorch#19

Closed

ssnl closed this as completed in #7779 May 25, 2018

andrewssobral mentioned this issue Sep 24, 2018

RuntimeError: cuda runtime error (7) : too many resources requested for launch at #8103

Closed

mdfirman mentioned this issue Jul 9, 2019

Error while Running monodepth2 on Jetson Nano nianticlabs/monodepth2#38

Closed

ezyang mentioned this issue Sep 1, 2019

Jetson: cuda runtime error (7) : too many resources requested for launch #24953

Closed

MauroPfister mentioned this issue Jun 16, 2020

DCN on Jetson TX2 open-mmlab/mmdetection#3041

Open

praeclarumjj3 mentioned this issue Nov 22, 2022

error in ms_deformable_im2col_cuda: too many resources requested for launch SHI-Labs/OneFormer#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120 #7680

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120 #7680

ShreyasSkandan commented May 18, 2018 •

edited

soumith commented May 18, 2018

ngimel commented May 18, 2018

ShreyasSkandan commented May 20, 2018

soumith commented May 20, 2018

ShreyasSkandan commented May 20, 2018

ngimel commented May 20, 2018

soumith commented May 21, 2018

ngimel commented May 21, 2018

ShreyasSkandan commented May 22, 2018

ShreyasSkandan commented May 24, 2018

ababycat commented May 26, 2018

andrewssobral commented Sep 24, 2018

andrewssobral commented Sep 24, 2018

andrewssobral commented Sep 25, 2018

MrLinNing commented Dec 27, 2018 •

edited

dusty-nv commented Jul 25, 2019

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120 #7680

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120 #7680

Comments

ShreyasSkandan commented May 18, 2018 • edited

Issue description

soumith commented May 18, 2018

ngimel commented May 18, 2018

ShreyasSkandan commented May 20, 2018

soumith commented May 20, 2018

ShreyasSkandan commented May 20, 2018

ngimel commented May 20, 2018

soumith commented May 21, 2018

ngimel commented May 21, 2018

ShreyasSkandan commented May 22, 2018

ShreyasSkandan commented May 24, 2018

ababycat commented May 26, 2018

andrewssobral commented Sep 24, 2018

andrewssobral commented Sep 24, 2018

andrewssobral commented Sep 25, 2018

MrLinNing commented Dec 27, 2018 • edited

dusty-nv commented Jul 25, 2019

ShreyasSkandan commented May 18, 2018 •

edited

MrLinNing commented Dec 27, 2018 •

edited