New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetson: cuda runtime error (7) : too many resources requested for launch #24953
Comments
I have also noticed an up-tick in the number of tests that don't run on the Jetson TX2 test_autograd.py test_nn.py Would it be possible to reduce the resources by decreasing tensor sizes? Attached is the test log |
Can you try some of the workarounds posted in #8103 and see if they help? We might need to adjust thread size choices for Jetson. More generally, we haven't been testing against Jetson in our CI, so this kind of breakage is going to keep happening until we do so. Maybe we should; I'm not sure about relative priority for Jetson. |
So we've added launch bounds to the kernels failing here in PyTorch 1.2+ (#19630). I verified that with recent PyTorch and the current JetPack CUDA 10.2, the model apparently used by @sdimantsd (https://github.com/dbolya/yolact/) above does not show the error. |
(I should add: If you still see the bug, please don't hesitate to re-open / file a new one.) |
馃悰 Bug
Im using jetson TX2 and jetson nano (this problem happend in both of them)
$ python3 eval.py --trained_model=weights/yolact_im700_54_800000.pth --score_threshold=0.3 --top_k=10 --image=data/yolact_example_0.png:asd.jpg
Config not specified. Parsed yolact_im700_config from the file name.
Loading model... Done.
THCudaCheck FAIL file=/media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu line=67 error=7 : too many resources requested for launch
Traceback (most recent call last):
File "eval.py", line 1020, in
evaluate(net, dataset)
File "eval.py", line 795, in evaluate
evalimage(net, inp, out)
File "eval.py", line 562, in evalimage
batch = FastBaseTransform()(frame.unsqueeze(0))
File "/home/ws/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ws/DL/yolact/utils/augmentations.py", line 618, in forward
img = F.interpolate(img, (cfg.max_size, cfg.max_size), mode='bilinear', align_corners=False)
File "/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2563, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:67
$ python3 collect_env.py
Collecting environment information...
PyTorch version: 1.0.0a0+bb15580
Is debug build: No
CUDA used to build PyTorch: 10.0.117
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.166
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: /usr/lib/aarch64-linux-gnu/libcudnn.so.7.3.1
Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] torch==1.0.0a0+bb15580
[pip3] torchvision==0.3.0
[conda] Could not collect
i saw the solution of changing "num_threads" but i can't find any variable named "num_threads".
can you help me?
thanks!
The text was updated successfully, but these errors were encountered: