about multi-gpu support #34

pipipopo · 2018-08-29T07:32:59Z

Hi, thanks for your amazing work. I have a problem with the multi-gpu support. The original batchsize is 2 for a real image and another random. In fact it's one pair. If we want to use multi-gpu for faster training, the batch size should be larger. But simply changing the batchsize causes error. May I know is it easy to modify the code for this purpose?

junyanz · 2018-08-30T03:25:47Z

What is your error? Have you set --gpu_ids?

pipipopo · 2018-08-30T07:25:19Z

RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
The error is above when I use config below:
GPU_ID=0,1
--gpu_ids ${GPU_ID}
--batchSize 4

pipipopo · 2018-09-03T09:12:04Z

create web directory ../checkpoints/edges2shoes/edges2shoes_bicycle_gan/web...
Traceback (most recent call last):
File "./train.py", line 34, in
model.optimize_parameters()
File "../BicycleGAN-master/models/bicycle_gan_model.py", line 211, in optimize_parameters
self.update_G_and_E()
File "../BicycleGAN-master/models/bicycle_gan_model.py", line 199, in update_G_and_E
self.backward_EG()
File "../BicycleGAN-master/models/bicycle_gan_model.py", line 151, in backward_EG
self.loss_G_GAN = self.backward_G_GAN(self.fake_data_encoded, self.netD, self.opt.lambda_GAN)
File "../BicycleGAN-master/models/bicycle_gan_model.py", line 143, in backward_G_GAN
pred_fake = netD(fake)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "../BicycleGAN-master/models/networks.py", line 263, in forward
result.append(self.modeli)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "../anaconda2/envs/xx/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

pipipopo · 2018-09-03T09:13:41Z

After reading pytorch docs i still don't know how to set the data to devices in BicycleGAN....

pipipopo · 2018-09-03T09:15:58Z

Should
net = torch.nn.DataParallel(net, gpu_ids)
cooperate with any
data.to(device)?

junyanz · 2018-09-03T19:51:06Z

I found the issue. If you set --netD basic_256 and --netD2 basic_256, it should work. It seems that the multi-GPU issue comes from multi-scale D (--netD basic_256_multi). I haven't had time to address the issue there. I also updated the code so that it is compatible with pix2pix/CycleGAN. You may want to check out the latest commit.

junyanz closed this as completed Oct 1, 2018

cuihaoleo mentioned this issue Nov 21, 2018

Fix multi-GPU runtime error with multi-scale netD #40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about multi-gpu support #34

about multi-gpu support #34

pipipopo commented Aug 29, 2018

junyanz commented Aug 30, 2018

pipipopo commented Aug 30, 2018

pipipopo commented Sep 3, 2018

pipipopo commented Sep 3, 2018

pipipopo commented Sep 3, 2018

junyanz commented Sep 3, 2018 •

edited

about multi-gpu support #34

about multi-gpu support #34

Comments

pipipopo commented Aug 29, 2018

junyanz commented Aug 30, 2018

pipipopo commented Aug 30, 2018

pipipopo commented Sep 3, 2018

pipipopo commented Sep 3, 2018

pipipopo commented Sep 3, 2018

junyanz commented Sep 3, 2018 • edited

junyanz commented Sep 3, 2018 •

edited