Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use RandomSampler in RetinaNet #6971

Closed
HuXinzhi1004 opened this issue Jan 9, 2022 · 3 comments
Closed

How to use RandomSampler in RetinaNet #6971

HuXinzhi1004 opened this issue Jan 9, 2022 · 3 comments
Assignees
Labels
community discussion community help wanted Extra attention is needed usage About how to use/change the configs/codes etc.

Comments

@HuXinzhi1004
Copy link

HuXinzhi1004 commented Jan 9, 2022

When I use RandomSampler in RetinaNet, I got an error.

/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/apis/train.py", line 175, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/base.py", line 233, in train_step
losses = self(**data)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
return old_func(*args, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/base.py", line 167, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/single_stage.py", line 79, in forward_train
gt_labels, gt_bboxes_ignore)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 652, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 182, in new_func
return old_func(*args, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 510, in loss
label_channels=label_channels)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 417, in get_targets
unmap_outputs=unmap_outputs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(map_results)))
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 318, in _get_targets_single
fill=self.num_classes)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/utils/misc.py", line 37, in unmap
ret[inds.type(torch.bool)] = data
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1607370141920/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f6be11248b2 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0xad2 (0x7f6be1376982 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f6be110fb7d in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fea0a (0x7f6c1e461a0a in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5feab6 (0x7f6c1e461ab6 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #23: __libc_start_main + 0xf0 (0x7f6c48515840 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

Does anyone know why?

@HuXinzhi1004 HuXinzhi1004 changed the title How How to use Jan 9, 2022
@HuXinzhi1004 HuXinzhi1004 changed the title How to use How to use RandomSampler in RetinaNet Jan 9, 2022
@RangiLyu
Copy link
Member

How did you change the config? Usually, the sampling method cannot be applied to models with FocalLoss.

@RangiLyu RangiLyu self-assigned this Jan 10, 2022
@RangiLyu RangiLyu added the usage About how to use/change the configs/codes etc. label Jan 10, 2022
@HuXinzhi1004
Copy link
Author

How did you change the config? Usually, the sampling method cannot be applied to models with FocalLoss.

sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),

and I change
self.sampling = True
in mmdet/models/dense_heads/retina_head.py

@RangiLyu
Copy link
Member

One of the possible reasons is your loss or gradient became NAN. I did not try any sampling method with focal loss before so I have no idea how to deal with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community discussion community help wanted Extra attention is needed usage About how to use/change the configs/codes etc.
Projects
None yet
Development

No branches or pull requests

3 participants