Multi-GPU training could not work normally? #12

GuangChen2016 · 2022-06-12T04:53:05Z

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates th at your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel;
As suggested, I modified the model by adding find_unused_parameters=True, as followings, model = DistributedDataParallel(model, device_ids=[rank], find_unused_parameters=True).to(device), but I still got the same errors, could you train normally when with multi GPU? Any suggestions to fix this?
Many Thanks.

The text was updated successfully, but these errors were encountered:

keonlee9420 · 2022-07-31T05:39:55Z

Hi @GuangChen2016 , sorry for the late response. Could you please set this value to 1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU training could not work normally? #12

Multi-GPU training could not work normally? #12

GuangChen2016 commented Jun 12, 2022

keonlee9420 commented Jul 31, 2022 •

edited

Multi-GPU training could not work normally? #12

Multi-GPU training could not work normally? #12

Comments

GuangChen2016 commented Jun 12, 2022

keonlee9420 commented Jul 31, 2022 • edited

keonlee9420 commented Jul 31, 2022 •

edited