You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DeepSpeed info: version=0.1.0, git-hash=6d60206, git-branch=master
File "/home/deepspeed/Code/identification/generative/CPGAN/orchestrator_msggan_deepspeed.py", line 253, in train
training_data=data)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/init.py", line 95, in initialize
Traceback (most recent call last):
File "identification/generative/CPGAN/train_deepspeed.py", line 186, in
collate_fn=collate_fn)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/pt/deepspeed_light.py", line 123, in init
dist.init_process_group(backend="nccl")
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 372, in init_process_group
main()
File "identification/generative/CPGAN/train_deepspeed.py", line 179, in main
raise RuntimeError("trying to initialize the default process group "
RuntimeError: trying to initialize the default process group twice!
save_every_n_steps=args.save_every_n_steps)
File "/home/deepspeed/Code/identification/generative/CPGAN/orchestrator_msggan_deepspeed.py", line 253, in train
training_data=data)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/init.py", line 95, in initialize
collate_fn=collate_fn)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/pt/deepspeed_light.py", line 123, in init
dist.init_process_group(backend="nccl")
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 372, in init_process_group
raise RuntimeError("trying to initialize the default process group "
RuntimeError: trying to initialize the default process group twice!
It would be nice to get a pointer on how to tackle such a situation, especially since it is a very common use case.
Kind regards
The text was updated successfully, but these errors were encountered:
That error message is due to the default process group of torch.distributed being initialized twice. To fix this for now, you can add dist_init_required=False to the second call of deepspeed.initialize().
Removing this inconvenience is on our roadmap and hope to get to it soon. We are also very happy to accept PRs :-).
Dear DeepSpeed-Team,
first of all thank you for your effort, I was very excited to hear about this approach.
I am currently trying to realize a GAN which requires me to initialize two networks, I tried the following without success:
Executing this, I get the following:
It would be nice to get a pointer on how to tackle such a situation, especially since it is a very common use case.
Kind regards
The text was updated successfully, but these errors were encountered: