You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to start simple training on coco dataset (all default) with batch size of 4 as described on Getting Started page of documentation. But I get RuntimeError: radix_sort:
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
[03/15 07:25:50 d2.engine.hooks]: Total training time: 0:00:10 (0:00:00 on hooks)
[03/15 07:25:50 d2.utils.events]: iter: 0 lr: N/A max_mem: 1965M
Traceback (most recent call last):
File "./train_net.py", line 161, in <module>
launch(
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/engine/launch.py", line 55, in launch
mp.spawn(
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/engine/launch.py", line 94, in _distributed_worker
main_func(*args)
File "/home/suzy/notebooks/refs/detectron2/tools/train_net.py", line 155, in main
return trainer.train()
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 431, in train
super().train(self.start_iter, self.max_iter)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 138, in train
self.run_step()
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 232, in run_step
loss_dict = self.model(data)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 160, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 430, in forward
gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 313, in label_and_sample_anchors
gt_labels_i = self._subsample_labels(gt_labels_i)
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 257, in _subsample_labels
pos_idx, neg_idx = subsample_labels(
File "/home/suzy/miniconda3/envs/refs/lib/python3.8/site-packages/detectron2/modeling/sampling.py", line 50, in subsample_labels
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
For datasets, i use coco (instances_train2017.json), structured as follows:
You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template.
The following information is missing: "Instructions To Reproduce the Issue and Full Logs";
We cannot reproduce this so it's unlikely we'd be able to help with it. If you're able to provide ways reproduce this (e.g. a docker) we can then investigate it.
According to your environment info, your cuda version and pytorch's cuda version do not match. That's likely to cause issues.
Hi!
RuntimeError: radix_sort:
Instructions To Reproduce the 🐛 Bug:
git rev-parse HEAD; git diff
:Expected behavior:
I expected the start of training
Environment:
The text was updated successfully, but these errors were encountered: