Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetson: cuda runtime error (7) : too many resources requested for launch #24953

Closed
sdimantsd opened this issue Aug 21, 2019 · 6 comments
Closed
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@sdimantsd
Copy link

馃悰 Bug

Im using jetson TX2 and jetson nano (this problem happend in both of them)

$ python3 eval.py --trained_model=weights/yolact_im700_54_800000.pth --score_threshold=0.3 --top_k=10 --image=data/yolact_example_0.png:asd.jpg
Config not specified. Parsed yolact_im700_config from the file name.

Loading model... Done.
THCudaCheck FAIL file=/media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu line=67 error=7 : too many resources requested for launch
Traceback (most recent call last):
File "eval.py", line 1020, in
evaluate(net, dataset)
File "eval.py", line 795, in evaluate
evalimage(net, inp, out)
File "eval.py", line 562, in evalimage
batch = FastBaseTransform()(frame.unsqueeze(0))
File "/home/ws/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ws/DL/yolact/utils/augmentations.py", line 618, in forward
img = F.interpolate(img, (cfg.max_size, cfg.max_size), mode='bilinear', align_corners=False)
File "/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2563, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:67

$ python3 collect_env.py
Collecting environment information...
PyTorch version: 1.0.0a0+bb15580
Is debug build: No
CUDA used to build PyTorch: 10.0.117

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.166
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: /usr/lib/aarch64-linux-gnu/libcudnn.so.7.3.1

Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] torch==1.0.0a0+bb15580
[pip3] torchvision==0.3.0
[conda] Could not collect

i saw the solution of changing "num_threads" but i can't find any variable named "num_threads".

can you help me?
thanks!

@zhangguanheng66
Copy link
Contributor

@VitalyFedyunin

@zhangguanheng66 zhangguanheng66 added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 21, 2019
@dylanbespalko
Copy link
Contributor

I have also noticed an up-tick in the number of tests that don't run on the Jetson TX2

test_autograd.py
Ran 1010 tests in 1806.564s
FAILED (errors=4, skipped=6, expected failures=1)

test_nn.py
Ran 1453 tests in 1400.445s
FAILED (failures=2, errors=61, skipped=64, expected failures=2)

Would it be possible to reduce the resources by decreasing tensor sizes?

Attached is the test log
tx2_too_many_resources.txt

@ezyang ezyang changed the title cuda runtime error (7) : too many resources requested for launch Jetson: cuda runtime error (7) : too many resources requested for launch Sep 1, 2019
@ezyang
Copy link
Contributor

ezyang commented Sep 1, 2019

Can you try some of the workarounds posted in #8103 and see if they help? We might need to adjust thread size choices for Jetson.

More generally, we haven't been testing against Jetson in our CI, so this kind of breakage is going to keep happening until we do so. Maybe we should; I'm not sure about relative priority for Jetson.

@ezyang
Copy link
Contributor

ezyang commented Sep 1, 2019

Other examples of us having fixed this type of problem: #17144, #7680

@t-vi
Copy link
Collaborator

t-vi commented Apr 14, 2021

PyTorch Version: 1.1.0 / 1.0.0

So we've added launch bounds to the kernels failing here in PyTorch 1.2+ (#19630). I verified that with recent PyTorch and the current JetPack CUDA 10.2, the model apparently used by @sdimantsd (https://github.com/dbolya/yolact/) above does not show the error.
Thus I'm closing this bug.

@t-vi t-vi closed this as completed Apr 14, 2021
@t-vi
Copy link
Collaborator

t-vi commented Apr 14, 2021

(I should add: If you still see the bug, please don't hesitate to re-open / file a new one.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants