New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120 #7680
Comments
we should fix the launch parameters. I presume we cant use as many threads per block on the TX2 as we use on desktop GPUs. |
Threads per block and maximum blocks in the grid are actually the same for TX2 as they are for desktop GPUs https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, only number of registers is smaller. @dusty-nv, do you know what might be causing this? |
@soumith: as @ngimel said, the number of threads per block is constant across different NVIDIA GPUs, the Tegra series included. Is it possible that it was compiled to require more registers than is available to the TX2, and maybe the kernel invocation of im2col requires some sort of launch_bounds() command? |
@ShreyasSkandan TX2 is arm64 platform. I presume pytorch is compiled from source. In this case, there isn't a chance that it was compiled to requre more registers than available on TX2 I think. The build log will be helpful to see. |
@soumith thanks for the quick response. I will try dig up the build log tomorrow and post it here. |
@soumith, if there are no launch bounds, it is in fact possible that a kernel is compiled to request more registers than available. At the compile time, compiler does not know how many threads you'll want to launch with, so, potentially, it can use too many registers per thread to later satisfy runtime requirements, and, e.g. trying to launch with 512 or 1024 threads could fail (not even a single block can be put on an SM), whereas launching with 256 would succeed. |
@ngimel is there a way where we can audit that GPU constraints on our server setting (i.e. not actually sitting and compiling TX2 PyTorch) |
Adding launch_bounds with the max number of threads the kernel is going to be launched with will cause compiler not to overuse registers. We had to do it e.g. for interp kernels when cuda 9 started using more registers
|
Works now, thanks! |
@ngimel hello, I met the same error on tx2, complied pytorch 0.3.0.. But the error is this: 'cuda runtime error(7): too many resources requested for launch at ...../pytorch/torch/lib/THCUNN/generic/SpatialDilatedMaxPooling.cu'. The file is different with 'VolumetricUpSamplingTrilinear.cu', is that I need add this line 'launch_bounds(1024)' to every function in file 'SpatialDilatedMaxPooling.cu'? |
I am also facing a similar issue:
Here's my setup:
Source code: Full log:
Do you know what it could be? |
I know that 'SpatialUpSamplingBilinear.cu' without launch_bounds(1024) leads to this error, but I don't know how to fix it... Tried this solution #8103 , but still not working (after compilation) |
My issue was solved by following #8103 (comment) |
@ngimel @ShreyasSkandan Hi, I have the same problem. Can this problem solved by gitting clone the new pytorch, than changing CUDA_NUM_THREADS =256 in the below two file and compiling it? |
@MrLinNing that appears to fix some of the functions, but not perhaps all - not sure. For more info, see: |
Issue description
I'm trying to run a variant of ERFNet on an NVIDIA TX-2 running Jetpack 3.2 (CUDA 9.0 and CuDNN 7).
I get the following error:
RuntimeError: cuda runtime error (7) : too many resources requested for launch at ../../../pytorch/torch/lib/THCUNN/im2col.h:120
This error indicates the size of the model + overhead is too large for the GPU? But this is roughly a 700mb model trying to perform inference on a single 640x512 grayscale image on a GPU that has roughly 6.5Gb of free space. I even tried training a new model on images at half that resolution and get the same error.
Any tips/feedback is appreciated.
The text was updated successfully, but these errors were encountered: