Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

Double gradient reduction in examples? #122

Closed
TimZaman opened this issue Jan 2, 2019 · 2 comments
Closed

Double gradient reduction in examples? #122

TimZaman opened this issue Jan 2, 2019 · 2 comments
Assignees

Comments

@TimZaman
Copy link
Contributor

TimZaman commented Jan 2, 2019

First, I'm confused why the example models implement their own DistributedDataParallel module. Why not use the torch one?

DistributedDataParallel is used in the examples:
https://github.com/kubeflow/pytorch-operator/blob/master/examples/ddp/mnist/cpu/mnist_ddp_cpu.py#L154

The DistributedDataParallel documentation states:

During the backwards pass, gradients from each node are averaged.

However, a DIY average_gradients function is used in the same example as well: https://github.com/kubeflow/pytorch-operator/blob/master/examples/ddp/mnist/cpu/mnist_ddp_cpu.py#L168

Double trouble?

@johnugeorge
Copy link
Member

/assign @andreyvelich

@andreyvelich
Copy link
Member

Thank you for your issue.
Yes, we can use DistributedDataParallel from pytorch library.
Will change it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants