Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to solve the error "cuda runtime error (98) : invalid device function" when run the borderDet? #53

Open
fengqian-wei opened this issue May 31, 2021 · 4 comments

Comments

@fengqian-wei
Copy link

fengqian-wei commented May 31, 2021

I first install cvpods and success to train retinaNet.
However, I face the error in border_align when I train borderDet.
Now I don't know how to fix it.

发生异常: RuntimeError
cuda runtime error (98) : invalid device function at /home//weizhiwei/work/cvpods/cvpods/layers/csrc/border_align/border_align_kernel.cu:202
File "/home/weizhiwei/work/cvpods/cvpods/layers/border_align.py", line 15, in forward
output = _C.border_align_forward(input, boxes, wh, pool_size)
File "/home/weizhiwei/work/cvpods/cvpods/layers/border_align.py", line 42, in forward
output = border_align(feature, boxes, wh, self.pool_size)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 809, in forward
ltrb_conv = self.border_align(feature, boxes)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 734, in forward
border_cls_conv = self.border_cls_subnet(cls_subnet, align_boxes, wh)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 147, in forward
) = self.head(features, shifts)
File "/home/weizhiwei/work/cvpods/cvpods/engine/base_runner.py", line 185, in run_step
loss_dict = self.model(data)
File "/home/weizhiwei/work/cvpods/cvpods/engine/base_runner.py", line 84, in train
self.run_step()
File "/home/weizhiwei/work/cvpods/cvpods/engine/runner.py", line 271, in train
super().train(self.start_iter, self.start_epoch, self.max_iter)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/train_net.py", line 96, in main
runner.train()
File "/home/weizhiwei/work/cvpods/cvpods/engine/launch.py", line 56, in launch
main_func(*args)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/train_net.py", line 110, in
args=(args,),

@fengqian-wei
Copy link
Author

error: when train in single GPU, Default process group is not initialized
solution:https://blog.csdn.net/m0_37568067/article/details/109785209

@Maycbj
Copy link
Member

Maycbj commented Jun 2, 2021

https://xiulian.blog.csdn.net/article/details/111035882
It works well on 2080Ti and V100. Maybe you should follow the methods as follows:
https://blog.csdn.net/m0_38007695/article/details/107065617

@Maycbj
Copy link
Member

Maycbj commented Jun 2, 2021

error: when train in single GPU, Default process group is not initialized
solution:https://blog.csdn.net/m0_37568067/article/details/109785209

yeah, it is a well known bug when training on single GPU. We will fix the Default process group initialized.

@wwwyyk
Copy link

wwwyyk commented Jun 25, 2021

have you solved this problem? I can't solve the problem too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants