Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in ms_deformable_col2im_cuda: invalid device function #13

Open
chen1234520 opened this issue Dec 11, 2020 · 10 comments
Open

error in ms_deformable_col2im_cuda: invalid device function #13

chen1234520 opened this issue Dec 11, 2020 · 10 comments

Comments

@chen1234520
Copy link

help help!!!! Thanks!

@lhlm1994
Copy link

me too ....... do you have troubles with running 'maks.sh'? , I can run mask.sh in my shell, but I can run with pycharm.

@jackroos
Copy link
Member

@chen1234520 Do you see errors during the compilation? Please provide more details about the environment you use, including system, CUDA, pytorch, e.t.c. Thank you!

@azamshoaib
Copy link

@jackroos I am also getting this error. There is no error in the compilation stage. On running the code I can see the epochs along with this error. Following is the my environment details:
Cuda : 10.2
pytorch : 1.6.0
Ubuntu 18.04

Kindly help me in this regard. Thank you

@jackroos
Copy link
Member

@ayberksener Could you check if you can run the unit test as described in the README?

@1216143369
Copy link

@jackroos I am also getting this error. There is no error in the compilation stage. On running the code I can see the epochs along with this error. Following is the my environment details:
Cuda : 10.2
pytorch : 1.6.0
Ubuntu 18.04

Kindly help me in this regard. Thank you

Have you solved the problem yet?

@GehenHe
Copy link

GehenHe commented Jan 26, 2021

Similar issue, I got an 'error in ms_Deformable_im2col_cuda: invalid device function' when run with "python test.py":

@ycliu93
Copy link

ycliu93 commented Feb 2, 2021

I encountered the same issue. Not sure whether it is relevant to slurm.
I could see the package multiscaledeformableattention is installed in the conda installed package list.

Using python test.py works well on the single GPU, while srun < some slurm parameters> python test.py doesn't work.
The same situation occurs when I try to run the code using slurm.

More specifically, I use single node 8 GPUs via slurm, while only one GPU is working and the remaining 7 GPUs cannot find the multiscaledeformableattention in Cuda kernel.

@GehenHe
Copy link

GehenHe commented Feb 3, 2021

I solve this problem by install pytorch=1.5 with cuda=9.2, i.e., :
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch

It seems that the cuda operator is only compatible with pytorch-1.5 with cuda 9.2.

@WindChaserZ
Copy link

My cuda verison is 11.0, and I also have troubles with running python test.py because of no moudel named multiscaledeformableattention,but I have run make.sh successfully.I guess the project does not support cuda11 compilation.

@ayushjain1144
Copy link

ayushjain1144 commented May 6, 2021

I had the same issue as @ycliu93 . I was able to train on one node, but it threw this error when I shifted to multiple nodes.
The solution was to delete build and just run ./make.sh again in the multigpu node. After that both python test.py and training worked as expected. (I am using cuda 10.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants