Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use torch.nn.DataParallel for intra-node computation #46

Closed
tlin-taolin opened this issue Apr 3, 2020 · 4 comments
Closed

Use torch.nn.DataParallel for intra-node computation #46

tlin-taolin opened this issue Apr 3, 2020 · 4 comments
Assignees

Comments

@tlin-taolin
Copy link

tlin-taolin commented Apr 3, 2020

It might be a good choice to use torch.nn.DataParallel for intra-node computation (across multi-GPUs) and intra-node gradient aggregation, and then use different communication backends for inter-node communication.

@tlin-taolin tlin-taolin changed the title Use Data Use torch.nn.DataParallel for in-node computation Apr 3, 2020
@tlin-taolin tlin-taolin changed the title Use torch.nn.DataParallel for in-node computation Use torch.nn.DataParallel for intra-node computation Apr 3, 2020
@ehoelzl
Copy link
Contributor

ehoelzl commented Apr 3, 2020

Wouldn't it be more appropriate to use DistributedDataParallel, as referenced here ?

@tlin-taolin
Copy link
Author

I think it should be a standard baseline for us to compare with, and we should check both of DataParallel and DistributedDataParallel for the intra-node case. But indeed I am not sure if we can directly use DistributedDataParallel in our current framework for intra-node communication (it also requires to run init_process_group).

BTW, I think the DistributedDataParallel is mainly designed/optimized for centralized training across multi-nodes but it loses the flexibility to use different communication strategies (e.g. compressed gradient), different communication topologies (e.g. ring topology for decentralized training). It is good to get the results from DistributedDataParallel (for distributed training) and compares it with our sync scheme.

@martinjaggi
Copy link
Member

DDP overlaps gradient computation with the communication. the effect should be noticeable. how does it compare to our reference implementations?

@ehoelzl
Copy link
Contributor

ehoelzl commented Dec 4, 2020

Closing in favor of mlbench/mlbench-benchmarks#69

@ehoelzl ehoelzl closed this as completed Dec 4, 2020
mlbench-3.1.0 automation moved this from To do to Done Dec 4, 2020
@ehoelzl ehoelzl removed this from Done in mlbench-3.1.0 Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants