Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataParallel copies the model onto GPUs sequentially #51385

Open
vadimkantorov opened this issue Jan 29, 2021 · 5 comments
Open

DataParallel copies the model onto GPUs sequentially #51385

vadimkantorov opened this issue Jan 29, 2021 · 5 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: data parallel module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Jan 29, 2021

I have 8 GPUs and can see it graphically with watch -n 0.1 nvidia-smi. It could save some time by doing the parallel async copy?

Same seems for distributing the batch size, but less sure

P.S. I know that DP is not recommended in favor of DDP, but for legacy code and simplicity reasons (also for easier recovery from OOMs, exceptions and easier logging), DP still remains important

cc @VitalyFedyunin @ngimel

@vadimkantorov vadimkantorov changed the title DataParalell copies the model onto GPUs sequentially DataParallel copies the model onto GPUs sequentially Jan 29, 2021
@ezyang ezyang added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: data parallel module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 1, 2021
@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2021

This sounds pretty reasonable

@vadimkantorov
Copy link
Contributor Author

Maybe even scatter_all somehow could be used (e.g. we don't have bandwidth to copy to all devices at once, once the first device has received the copy, it could also transfer the model copy to other devices, halving the copy time)

@ngimel
Copy link
Collaborator

ngimel commented Feb 1, 2021

Are you talking about initial parameter copy? Because during training broadcast/reduce should not copy data to the cpu.

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Feb 1, 2021

Yes, initial parameter copy. But the model is already on the first GPU, since I'm doing:

model = Model().cuda()
model = nn.DataParallel(model)

So it should already be amenable to inter-GPU copies (if it's enabled)

@ngimel
Copy link
Collaborator

ngimel commented Feb 1, 2021

We will accept a PR implementing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: data parallel module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants