How to use RandomSampler in RetinaNet #6971

HuXinzhi1004 · 2022-01-09T12:16:24Z

When I use RandomSampler in RetinaNet, I got an error.

/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/apis/train.py", line 175, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/base.py", line 233, in train_step
losses = self(**data)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
return old_func(*args, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/base.py", line 167, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/single_stage.py", line 79, in forward_train
gt_labels, gt_bboxes_ignore)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 652, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 182, in new_func
return old_func(*args, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 510, in loss
label_channels=label_channels)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 417, in get_targets
unmap_outputs=unmap_outputs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(map_results)))
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 318, in _get_targets_single
fill=self.num_classes)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/utils/misc.py", line 37, in unmap
ret[inds.type(torch.bool)] = data
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1607370141920/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f6be11248b2 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f6be1376982 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f6be110fb7d in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fea0a (0x7f6c1e461a0a in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5feab6 (0x7f6c1e461ab6 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #23: __libc_start_main + 0xf0 (0x7f6c48515840 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

Does anyone know why?

The text was updated successfully, but these errors were encountered:

RangiLyu · 2022-01-10T13:30:43Z

How did you change the config? Usually, the sampling method cannot be applied to models with FocalLoss.

HuXinzhi1004 · 2022-01-12T04:30:39Z

How did you change the config? Usually, the sampling method cannot be applied to models with FocalLoss.

sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),

and I change
self.sampling = True
in mmdet/models/dense_heads/retina_head.py

RangiLyu · 2022-01-27T08:24:14Z

One of the possible reasons is your loss or gradient became NAN. I did not try any sampling method with focal loss before so I have no idea how to deal with this.

HuXinzhi1004 changed the title ~~How~~ How to use Jan 9, 2022

HuXinzhi1004 changed the title ~~How to use~~ How to use RandomSampler in RetinaNet Jan 9, 2022

RangiLyu self-assigned this Jan 10, 2022

RangiLyu added the usage About how to use/change the configs/codes etc. label Jan 10, 2022

RangiLyu added community help wanted Extra attention is needed community discussion labels Jan 27, 2022

hhaAndroid closed this as completed Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use RandomSampler in RetinaNet #6971

How to use RandomSampler in RetinaNet #6971

HuXinzhi1004 commented Jan 9, 2022 •

edited

Loading

RangiLyu commented Jan 10, 2022

HuXinzhi1004 commented Jan 12, 2022

RangiLyu commented Jan 27, 2022

How to use RandomSampler in RetinaNet #6971

How to use RandomSampler in RetinaNet #6971

Comments

HuXinzhi1004 commented Jan 9, 2022 • edited Loading

RangiLyu commented Jan 10, 2022

HuXinzhi1004 commented Jan 12, 2022

RangiLyu commented Jan 27, 2022

HuXinzhi1004 commented Jan 9, 2022 •

edited

Loading