bug: bugs when use multi-gpu training #116

cywang97 · 2020-07-16T13:54:37Z

Hi, hiro, I use multi GPU to train an ASR model. It seems torch.nn.parallel.DataParallel won't scatter the dict input. You should overwrite the scatter function in CustomDataParallel function to make sure each item in the dict can be scattered on different devices.

hirofumi0810 · 2020-07-16T18:51:32Z

@cywang97 Thank you for your report. As I usually use a single GPU, I didn't notice that. I will check and fix it.

kangj13 · 2020-09-02T13:16:10Z

Hi, I met similar problem using multi GPUs.

It seems all parameters are only on GPU 0 and inputs on GPU 1.

hirofumi0810 added the bug label Jul 16, 2020

hirofumi0810 mentioned this issue Sep 9, 2020

Fix multi-GPU training #131

Merged

hirofumi0810 closed this as completed Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: bugs when use multi-gpu training #116

bug: bugs when use multi-gpu training #116

cywang97 commented Jul 16, 2020

hirofumi0810 commented Jul 16, 2020

kangj13 commented Sep 2, 2020

bug: bugs when use multi-gpu training #116

bug: bugs when use multi-gpu training #116

Comments

cywang97 commented Jul 16, 2020

hirofumi0810 commented Jul 16, 2020

kangj13 commented Sep 2, 2020