DCN on Jetson TX2 #3041

MauroPfister · 2020-06-16T09:20:15Z

Hi

I am trying to use the deformable convolutions from this repo on a Jetson TX2. Compilation was successful and I can also run them from Python. However, for every call of the DCN I get the following error:
error in deformable_im2col: too many resources requested for launch

I was wondering if there are any setting in the .cu files that I can change to fix this error?

Minimal reproducible example

# Execute from parent directory of ops folder

import torch
from ops.dcn import DeformConvPack

device = torch.device('cuda')
dcn = DeformConvPack(in_channels=256,
                     out_channels=256,
                     kernel_size=(3, 3),
                     padding=1).to(device)
input = torch.Tensor(16, 256, 26, 20).to(device)
output = dcn(input)

Environment

Jetson TX2 with JetPack 4.3
Python 3.6.9
Pytorch 1.4
Torchvision 0.5

Since I only wanted to install DCNs instead of the whole repo, I used a reduced setup.py (copied from this repo):

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='mmdet',
    ext_modules=[
        CUDAExtension('deform_conv_cuda', [
            'src/deform_conv_cuda.cpp',
            'src/deform_conv_cuda_kernel.cu'
        ]),
        CUDAExtension('deform_pool_cuda', [
            'src/deform_pool_cuda.cpp',
            'src/deform_pool_cuda_kernel.cu'
        ])
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

Bug fix
After a quick search on Google I found this PyTorch issue which seems related. Unfortunately I have no experience with CUDA at all, so I am not sure if this helps.

The text was updated successfully, but these errors were encountered:

MauroPfister · 2020-06-16T14:39:44Z

I was able to solve the issue by replacing CUDA_NUM_THREADS = 1024 by CUDA_NUM_THREADS = 512 and recompiling:

mmdetection/mmdet/ops/dcn/src/cuda/deform_conv_cuda_kernel.cu

Line 76 in 2b6f661

const int CUDA_NUM_THREADS = 1024;

The regular convolutions of PyTorch do not seem to have this problem. Maybe the CUDA_NUM_THREADS constant could be set depending for which architecture the DCNs are built?

hellock · 2020-06-19T17:16:06Z

Thanks for your reporting! It is a known issue that setting CUDA_NUM_THREADS to 1024 causes the building failure on some old or lightweight devices. We have not found a good way to set it according to the gpu arch. PRs are welcome if you have any ideas.

MauroPfister · 2020-06-21T12:29:09Z

I don't have any experience with PyTorch CUDA extensions, so I can't help with a PR unfortunately. But maybe just mention it in a README somewhere? That way people could easily fix the issue themselves.

jshilong · 2021-07-20T08:29:25Z

Thanks for your reporting! I would add it to FAQ to help people locate problems faster.

hhaAndroid added the community help wanted Extra attention is needed label Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DCN on Jetson TX2 #3041

DCN on Jetson TX2 #3041

MauroPfister commented Jun 16, 2020

MauroPfister commented Jun 16, 2020

hellock commented Jun 19, 2020

MauroPfister commented Jun 21, 2020

jshilong commented Jul 20, 2021

DCN on Jetson TX2 #3041

DCN on Jetson TX2 #3041

Comments

MauroPfister commented Jun 16, 2020

MauroPfister commented Jun 16, 2020

hellock commented Jun 19, 2020

MauroPfister commented Jun 21, 2020

jshilong commented Jul 20, 2021