Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMCV, RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found. #22

Closed
HaoranZhuExplorer opened this issue Apr 3, 2024 · 2 comments

Comments

@HaoranZhuExplorer
Copy link

Dear authors,

Thank you for your contribution!

I setup the environment according to your readme and your provided requirements.txt in previous issues, however, when I try to run the training script:
./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
it gives me the following error for MMCV package while I'm using cuda environment:

File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 274, in _inner_forward out = self.conv2(out) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 251, in forward return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias, File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 73, in forward ext_module.modulated_deform_conv_forward( RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.

The full error log is available here: error_log.txt. At line 55 of the error log, it shows "MMCV CUDA Compiler: not available", which may be causing the issue. Please note that I'm running the codebase on a slurm GPU HPC, which means the GPU is not installed on my login node by default, and I need to request GPU resources from the HPC. During the experiments, I ran the script after getting the GPU resources, but it still shows the above error.

By following this link, I also try to install mmcv-full cuda version using the following command
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu112/torch1.10/index.html, but it still gives me the same error.

Is there any way to solve this issue? Thanks!

Best regards

@tomztyang
Copy link
Contributor

Hi,

It seems like something wrong on the mmcv installation. On my opinion, please: (1). check your CUDA version and mmcv-full pre-built package CUDA version; (2). rebuild or re-install the environment after assigned GPU resources?

Best,
Zetong

@HaoranZhuExplorer
Copy link
Author

Thank you for your prompt response! I find that my cuda version mismatches mmcv-full's cuda version. I solve the issue by specifying the explicit cuda version when installing mmcv-full by running the following command:
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants