Large batch size and multiple GPUs #137

jjcao · 2017-10-31T07:35:09Z

When I train pix2pix model with batchSize > 1, norm = batch, and multiple GPU, the results seem wrong/strange.

When I train pix2pix model with batchSize > 1, norm = batch, and single GPU, the results are correct.

Could this be solved?

Thank you.

junyanz · 2017-10-31T07:41:53Z

We observe that batchSize=1 with single gpu gives us the best results so far.
According to this post, It seems that pytorch calculates mean/var statistics for each gpu.
So how many images do you have per GPU? 1 per GPU might cause some problems for batchnorm.
Have you tried instancenorm with multi-gpu setting and batchSize>1?

jjcao · 2017-10-31T09:45:22Z

Yes. with --norm instance, it worked.

Specify the number of images per GPU? Is there an option or is it simple for changing your code?

junyanz · 2017-10-31T10:05:10Z

I guess it will be batchSize/#gpus.

jjcao · 2017-10-31T14:00:35Z

If it is batchSize/#gpus, then norm still need to be "instance" for successful training. I have tested this.

jjcao closed this as completed Oct 31, 2017

junyanz mentioned this issue Nov 10, 2017

How to train on multi GPU #148

Closed

mhusseinsh mentioned this issue Jul 25, 2018

Multiple GPUs training #327

Closed

JamesChenChina mentioned this issue Mar 24, 2019

multi gpus question #573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large batch size and multiple GPUs #137

Large batch size and multiple GPUs #137

jjcao commented Oct 31, 2017

junyanz commented Oct 31, 2017

jjcao commented Oct 31, 2017

junyanz commented Oct 31, 2017

jjcao commented Oct 31, 2017

Large batch size and multiple GPUs #137

Large batch size and multiple GPUs #137

Comments

jjcao commented Oct 31, 2017

junyanz commented Oct 31, 2017

jjcao commented Oct 31, 2017

junyanz commented Oct 31, 2017

jjcao commented Oct 31, 2017