Fix multi-GPU runtime error with multi-scale netD #40

cuihaoleo · 2018-11-21T13:42:45Z

This PR solves #34 .

About the original issue: When torch.nn.DataParallel replicates a D_NLayersMulti object to multiple devices, although all submodules are replicated, self.model (which is an instance of ListModule) is not replicated because PyTorch doesn't know how to copy it correctly. As a result, all replicated D_NLayersMulti instances have self.model pointing to the same ListModule object whose module attribute points to original D_NLayersMulti object before replication. You can check id(self.model) in D_NLayersMulti.forward to verify it.

I removed the usage of ListModule class, and made D_NLayersMulti.forward directly call submodule.

junyanz · 2018-11-22T02:34:37Z

Thanks for the fix. Really helpful.

Fix multi-GPU runtime error with multi-scale netD (#34)

982e75d

junyanz merged commit dc9aa60 into junyanz:master Nov 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multi-GPU runtime error with multi-scale netD #40

Fix multi-GPU runtime error with multi-scale netD #40

cuihaoleo commented Nov 21, 2018

junyanz commented Nov 22, 2018

Fix multi-GPU runtime error with multi-scale netD #40

Fix multi-GPU runtime error with multi-scale netD #40

Conversation

cuihaoleo commented Nov 21, 2018

junyanz commented Nov 22, 2018