-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use torch.nn.DataParallel for intra-node computation #46
Comments
Data
torch.nn.DataParallel
for in-node computation
torch.nn.DataParallel
for in-node computation
Wouldn't it be more appropriate to use |
I think it should be a standard baseline for us to compare with, and we should check both of BTW, I think the |
DDP overlaps gradient computation with the communication. the effect should be noticeable. how does it compare to our reference implementations? |
Closing in favor of mlbench/mlbench-benchmarks#69 |
It might be a good choice to use
torch.nn.DataParallel
for intra-node computation (across multi-GPUs) and intra-node gradient aggregation, and then use different communication backends for inter-node communication.The text was updated successfully, but these errors were encountered: