Initialization of two nn.Modules (e.g. generator and discriminator) #101

LvanderGoten · 2020-02-23T19:47:08Z

Dear DeepSpeed-Team,

first of all thank you for your effort, I was very excited to hear about this approach.

I am currently trying to realize a GAN which requires me to initialize two networks, I tried the following without success:

generator_engine, _, _, __ = deepspeed.initialize(args=args,
                                                  model=self.generator,
                                                  model_parameters=filter(
                                                      lambda p: p.requires_grad,
                                                      self.generator.parameters()),
                                                  training_data=data)
discriminator_engine, _, data_loader, __ = deepspeed.initialize(args=args,
                                                                model=self.discriminator,
                                                                model_parameters=filter(
                                                                    lambda p: p.requires_grad,
                                                                    self.discriminator.parameters()),
                                                                training_data=data)

Executing this, I get the following:

DeepSpeed info: version=0.1.0, git-hash=6d60206, git-branch=master
File "/home/deepspeed/Code/identification/generative/CPGAN/orchestrator_msggan_deepspeed.py", line 253, in train
training_data=data)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/init.py", line 95, in initialize
Traceback (most recent call last):
File "identification/generative/CPGAN/train_deepspeed.py", line 186, in
collate_fn=collate_fn)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/pt/deepspeed_light.py", line 123, in init
dist.init_process_group(backend="nccl")
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 372, in init_process_group
main()
File "identification/generative/CPGAN/train_deepspeed.py", line 179, in main
raise RuntimeError("trying to initialize the default process group "
RuntimeError: trying to initialize the default process group twice!
save_every_n_steps=args.save_every_n_steps)
File "/home/deepspeed/Code/identification/generative/CPGAN/orchestrator_msggan_deepspeed.py", line 253, in train
training_data=data)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/init.py", line 95, in initialize
collate_fn=collate_fn)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/pt/deepspeed_light.py", line 123, in init
dist.init_process_group(backend="nccl")
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 372, in init_process_group
raise RuntimeError("trying to initialize the default process group "
RuntimeError: trying to initialize the default process group twice!

It would be nice to get a pointer on how to tackle such a situation, especially since it is a very common use case.

Kind regards

The text was updated successfully, but these errors were encountered:

ShadenSmith · 2020-02-23T19:52:57Z

Hello, thank you for your interest in DeepSpeed!

That error message is due to the default process group of torch.distributed being initialized twice. To fix this for now, you can add dist_init_required=False to the second call of deepspeed.initialize().

Removing this inconvenience is on our roadmap and hope to get to it soon. We are also very happy to accept PRs :-).

JulesGM · 2022-12-11T21:11:58Z

@ShadenSmith Is there a way to initialize multiple models at once, and get multiple objects to do "forward" with?

ShadenSmith closed this as completed Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization of two nn.Modules (e.g. generator and discriminator) #101

Initialization of two nn.Modules (e.g. generator and discriminator) #101

LvanderGoten commented Feb 23, 2020

ShadenSmith commented Feb 23, 2020

JulesGM commented Dec 11, 2022

Initialization of two nn.Modules (e.g. generator and discriminator) #101

Initialization of two nn.Modules (e.g. generator and discriminator) #101

Comments

LvanderGoten commented Feb 23, 2020

ShadenSmith commented Feb 23, 2020

JulesGM commented Dec 11, 2022