-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: Default process group is not initialized #22
Comments
soft link to Outputs/model_logs/cvpods_playground/detection/coco/borderdet/borderdet.res101.fpn.coco.800size.2x sys.platform linux PyTorch built with:
[10/16 16:29:02 cvpods]: Command line arguments: Namespace(dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False) |
in cvpods/engine/launch.py it works |
Well, seems you are using 1-GPU during training? Such an error shouldn't happen. Could you please provide command your are using ? |
Since the reporter doesn't reply for a week, We close this issue. |
[10/16 15:51:32 pods.engine.trainer]: Starting training from iteration 0
ERROR [10/16 15:51:32 pods.engine.trainer]: Exception during training:
Traceback (most recent call last):
File "/media/sda6/yhh/FCOS/BorderDet/cvpods/engine/trainer.py", line 89, in train
self.run_step()
File "/media/sda6/yhh/FCOS/BorderDet/cvpods/engine/trainer.py", line 193, in run_step
loss_dict = self.model(data)
File "/home/yons/anaconda3/envs/borderdet140-yhh/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/media/sda6/yhh/FCOS/BorderDet/cvpods/modeling/meta_arch/borderdet.py", line 167, in forward
bd_box_delta,
File "/media/sda6/yhh/FCOS/BorderDet/cvpods/modeling/meta_arch/borderdet.py", line 237, in losses
dist.all_reduce(num_foreground)
File "/home/yons/anaconda3/envs/borderdet140-yhh/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 901, in all_reduce
_check_default_pg()
File "/home/yons/anaconda3/envs/borderdet140-yhh/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 193, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
[10/16 15:51:32 pods.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
The text was updated successfully, but these errors were encountered: