Jetson: cuda runtime error (7) : too many resources requested for launch #24953

sdimantsd · 2019-08-21T06:34:48Z

🐛 Bug

Im using jetson TX2 and jetson nano (this problem happend in both of them)

$ python3 eval.py --trained_model=weights/yolact_im700_54_800000.pth --score_threshold=0.3 --top_k=10 --image=data/yolact_example_0.png:asd.jpg
Config not specified. Parsed yolact_im700_config from the file name.

Loading model... Done.
THCudaCheck FAIL file=/media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu line=67 error=7 : too many resources requested for launch
Traceback (most recent call last):
File "eval.py", line 1020, in
evaluate(net, dataset)
File "eval.py", line 795, in evaluate
evalimage(net, inp, out)
File "eval.py", line 562, in evalimage
batch = FastBaseTransform()(frame.unsqueeze(0))
File "/home/ws/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ws/DL/yolact/utils/augmentations.py", line 618, in forward
img = F.interpolate(img, (cfg.max_size, cfg.max_size), mode='bilinear', align_corners=False)
File "/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2563, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:67

$ python3 collect_env.py
Collecting environment information...
PyTorch version: 1.0.0a0+bb15580
Is debug build: No
CUDA used to build PyTorch: 10.0.117

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.166
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: /usr/lib/aarch64-linux-gnu/libcudnn.so.7.3.1

Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] torch==1.0.0a0+bb15580
[pip3] torchvision==0.3.0
[conda] Could not collect

PyTorch Version: 1.1.0 / 1.0.0
OS (e.g., Linux): linux, arm, Jetson TX2/nano
How you installed PyTorch: pip from wheels and compile it from source
Build command you used (if compiling from source): i follow the intredaction here: https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/
Python version: 3.6.8
CUDA/cuDNN version: 10, 7.3.1
GPU models and configuration: jetson TX2

i saw the solution of changing "num_threads" but i can't find any variable named "num_threads".

can you help me?
thanks!

zhangguanheng66 · 2019-08-21T15:19:07Z

@VitalyFedyunin

dylanbespalko · 2019-09-01T05:45:40Z

I have also noticed an up-tick in the number of tests that don't run on the Jetson TX2

test_autograd.py
Ran 1010 tests in 1806.564s
FAILED (errors=4, skipped=6, expected failures=1)

test_nn.py
Ran 1453 tests in 1400.445s
FAILED (failures=2, errors=61, skipped=64, expected failures=2)

Would it be possible to reduce the resources by decreasing tensor sizes?

Attached is the test log
tx2_too_many_resources.txt

ezyang · 2019-09-01T14:38:35Z

Can you try some of the workarounds posted in #8103 and see if they help? We might need to adjust thread size choices for Jetson.

More generally, we haven't been testing against Jetson in our CI, so this kind of breakage is going to keep happening until we do so. Maybe we should; I'm not sure about relative priority for Jetson.

ezyang · 2019-09-01T14:39:43Z

Other examples of us having fixed this type of problem: #17144, #7680

t-vi · 2021-04-14T06:14:31Z

PyTorch Version: 1.1.0 / 1.0.0

So we've added launch bounds to the kernels failing here in PyTorch 1.2+ (#19630). I verified that with recent PyTorch and the current JetPack CUDA 10.2, the model apparently used by @sdimantsd (https://github.com/dbolya/yolact/) above does not show the error.
Thus I'm closing this bug.

t-vi · 2021-04-14T06:15:29Z

(I should add: If you still see the bug, please don't hesitate to re-open / file a new one.)

zhangguanheng66 added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 21, 2019

dylanbespalko mentioned this issue Sep 1, 2019

WIP: Cpu strided complex support #25373

Closed

14 tasks

ezyang changed the title ~~cuda runtime error (7) : too many resources requested for launch~~ Jetson: cuda runtime error (7) : too many resources requested for launch Sep 1, 2019

t-vi closed this as completed Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jetson: cuda runtime error (7) : too many resources requested for launch #24953

Jetson: cuda runtime error (7) : too many resources requested for launch #24953

sdimantsd commented Aug 21, 2019

zhangguanheng66 commented Aug 21, 2019

dylanbespalko commented Sep 1, 2019

ezyang commented Sep 1, 2019

ezyang commented Sep 1, 2019

t-vi commented Apr 14, 2021 •

edited

t-vi commented Apr 14, 2021

Jetson: cuda runtime error (7) : too many resources requested for launch #24953

Jetson: cuda runtime error (7) : too many resources requested for launch #24953

Comments

sdimantsd commented Aug 21, 2019

🐛 Bug

zhangguanheng66 commented Aug 21, 2019

dylanbespalko commented Sep 1, 2019

ezyang commented Sep 1, 2019

ezyang commented Sep 1, 2019

t-vi commented Apr 14, 2021 • edited

t-vi commented Apr 14, 2021

t-vi commented Apr 14, 2021 •

edited