Double gradient reduction in examples? #122

TimZaman · 2019-01-02T07:14:51Z

First, I'm confused why the example models implement their own DistributedDataParallel module. Why not use the torch one?

The DistributedDataParallel documentation states:

During the backwards pass, gradients from each node are averaged.

Double trouble?

The text was updated successfully, but these errors were encountered:

johnugeorge · 2019-01-02T08:26:54Z

andreyvelich · 2019-01-09T23:31:06Z

Thank you for your issue.
Yes, we can use DistributedDataParallel from pytorch library.
Will change it.

k8s-ci-robot assigned andreyvelich Jan 2, 2019

andreyvelich mentioned this issue Jan 10, 2019

Change Distributed Data Parallel example #124

Merged

k8s-ci-robot closed this as completed in #124 Jan 11, 2019

Provide feedback