You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train VL-BERT for RefCOCO+ (python refcoco/train_end2end.py --cfg cfgs/refcoco/base_detected_regions_4x16G.yaml). However, I am getting the following CUDA-related error.
THCudaCheck FAIL file=/project/ocean/tsriniva/VL-BERT/common/lib/roi_pooling/cuda/ROIAlign_cuda.cu line=297 error=98 : invalid device function
Traceback (most recent call last):
File "refcoco/train_end2end.py", line 60, in <module>
main()
File "refcoco/train_end2end.py", line 54, in main
rank, model = train_net(args, config)
File "/project/ocean/tsriniva/VL-BERT/refcoco/../refcoco/function/train.py", line 323, in train_net
gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS)
File "/project/ocean/tsriniva/VL-BERT/refcoco/../common/trainer.py", line 115, in train
outputs, loss = net(*batch)
File "/home/tsriniva/anaconda2/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/project/ocean/tsriniva/VL-BERT/refcoco/../common/module.py", line 22, in forward
return self.train_forward(*inputs, **kwargs)
File "/project/ocean/tsriniva/VL-BERT/refcoco/../refcoco/modules/resnet_vlbert_for_refcoco.py", line 96, in train_forward
segms=None)
File "/home/tsriniva/anaconda2/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/project/ocean/tsriniva/VL-BERT/refcoco/../common/fast_rcnn.py", line 149, in forward
roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype)
File "/home/tsriniva/anaconda2/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/project/ocean/tsriniva/VL-BERT/refcoco/../common/lib/roi_pooling/roi_align.py", line 69, in forward
input.float(), rois.float(), self.output_size, self.spatial_scale, self.sampling_ratio
File "/project/ocean/tsriniva/VL-BERT/refcoco/../common/lib/roi_pooling/roi_align.py", line 20, in forward
input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio
RuntimeError: cuda runtime error (98) : invalid device function at /project/ocean/tsriniva/VL-BERT/common/lib/roi_pooling/cuda/ROIAlign_cuda.cu:297
Segmentation fault (core dumped)
Is there any fix for this?
The text was updated successfully, but these errors were encountered:
I am trying to train VL-BERT for RefCOCO+ (
python refcoco/train_end2end.py --cfg cfgs/refcoco/base_detected_regions_4x16G.yaml
). However, I am getting the following CUDA-related error.Is there any fix for this?
The text was updated successfully, but these errors were encountered: