You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I setup the environment according to your readme and your provided requirements.txt in previous issues, however, when I try to run the training script: ./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
it gives me the following error for MMCV package while I'm using cuda environment:
File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 274, in _inner_forward out = self.conv2(out) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 251, in forward return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias, File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 73, in forward ext_module.modulated_deform_conv_forward( RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.
The full error log is available here: error_log.txt. At line 55 of the error log, it shows "MMCV CUDA Compiler: not available", which may be causing the issue. Please note that I'm running the codebase on a slurm GPU HPC, which means the GPU is not installed on my login node by default, and I need to request GPU resources from the HPC. During the experiments, I ran the script after getting the GPU resources, but it still shows the above error.
By following this link, I also try to install mmcv-full cuda version using the following command pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu112/torch1.10/index.html, but it still gives me the same error.
Is there any way to solve this issue? Thanks!
Best regards
The text was updated successfully, but these errors were encountered:
It seems like something wrong on the mmcv installation. On my opinion, please: (1). check your CUDA version and mmcv-full pre-built package CUDA version; (2). rebuild or re-install the environment after assigned GPU resources?
Thank you for your prompt response! I find that my cuda version mismatches mmcv-full's cuda version. I solve the issue by specifying the explicit cuda version when installing mmcv-full by running the following command: pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html
Dear authors,
Thank you for your contribution!
I setup the environment according to your readme and your provided requirements.txt in previous issues, however, when I try to run the training script:
./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
it gives me the following error for MMCV package while I'm using cuda environment:
File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 274, in _inner_forward out = self.conv2(out) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 251, in forward return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias, File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 73, in forward ext_module.modulated_deform_conv_forward( RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.
The full error log is available here: error_log.txt. At line 55 of the error log, it shows "MMCV CUDA Compiler: not available", which may be causing the issue. Please note that I'm running the codebase on a slurm GPU HPC, which means the GPU is not installed on my login node by default, and I need to request GPU resources from the HPC. During the experiments, I ran the script after getting the GPU resources, but it still shows the above error.
By following this link, I also try to install mmcv-full cuda version using the following command
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu112/torch1.10/index.html
, but it still gives me the same error.Is there any way to solve this issue? Thanks!
Best regards
The text was updated successfully, but these errors were encountered: