Allow DataParallel to wrap CPU modules #17065

mrshenli · 2019-02-13T20:04:47Z

🚀 Feature

Creating a model on CPU and then wrapping the model with DataParallel should automatically replicate the model on destination GPUs. Are there any reason to enforce that DataParallel's input model must be on GPU?

Motivation

model = nn.Linear(2, 2)
net = nn.DataParallel(model, device_ids=[0,1])
input_var = torch.randn(10, 2)
net(input_var)

The code above throws TypeError: Broadcast function not implemented for CPU tensors. It avoid the error, users need to explicitly call model.cuda().

It is confusing whether it is the input tensor or the model tensor that should be placed on GPU.
When calling nn.DataParallel(model, device_ids=[0,1]), we already have enough info on where the model should be replicated. It can be automatically handles regardless of whether the model is stored on CPU or GPU.

Pitch

Support the above code snippet.

The text was updated successfully, but these errors were encountered:

mrshenli · 2019-02-13T20:21:43Z

CC @douwekiela @pietern @soumith

ssnl · 2019-02-13T20:56:44Z

Creating a model on CPU and then wrapping the model with DataParallel should automatically replicate the model on destination GPUs. Are there any reason to enforce that DataParallel's input model must be on GPU?

This is not true. The model is broadcast at the beginning of each forward, not when constructing the DataParallel wrapper. The disadvantage of having the model on CPU, of course, is that the gradients are reduced to CPU at each iteration, which is slow and undesirable. IMO, automatically convert to one GPU upon construction is also not desirable because:

Users may save a pointer to the wrapped module and reasonably expect it to still be on the original device.
It can initialize a CUDA context, which isn't obvious to users that it should.

mrshenli · 2019-02-13T21:05:16Z

@ssnl

How about explicitly throwing an error when constructing DataParallel if the wrapped model is on CPU?

ssnl · 2019-02-13T21:07:44Z

How about explicitly throwing an error when constructing DataParallel if the wrapped model is on CPU?

This SGTM :)

douwekiela · 2019-02-13T21:17:02Z

Okay, I think we should update the documentation for this then? Also, what is the best way to then move input_var to GPU? With the .to() semantics, we would have to specify one of the device_ids manually?

mrshenli · 2019-02-13T21:41:31Z

@douwekiela yes, I will update the docs in the fix for this issue.

Also, what is the best way to then move input_var to GPU? With the .to() semantics, we would have to specify one of the device_ids manually?

You don't have to move input_var to GPU. If you prefer to store it on GPU, it does not need to be one in the device_ids list I think. To move input_var, the following will all work:

input_var.cuda()
input_var.cuda(1)
input_var.to('cuda:1')
input_var.to(torch.device('cuda:1'))

douwekiela · 2019-02-13T21:46:02Z

Right. So I guess .cuda() assigns the tensor to the correct GPU for DataParallel automatically (@ssnl can you confirm?)? If so, that should be part of the code snippet imo ;)

mrshenli · 2019-02-13T22:26:00Z

.cuda() moves it to the default GPU [link].

ssnl · 2019-02-14T00:19:01Z

@douwekiela The recommended way is to define one device object and use it throughout your program. https://pytorch.org/blog/pytorch-0_4_0-migration-guide/ is a helpful reading.

mrshenli self-assigned this Feb 13, 2019

mrshenli added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Feb 13, 2019

mrshenli mentioned this issue Feb 14, 2019

Enforce module device at DataParallel construction time #17129

Closed

facebook-github-bot closed this as completed in 472cfc0 Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow DataParallel to wrap CPU modules #17065

Allow DataParallel to wrap CPU modules #17065

mrshenli commented Feb 13, 2019

mrshenli commented Feb 13, 2019

ssnl commented Feb 13, 2019 •

edited

mrshenli commented Feb 13, 2019

ssnl commented Feb 13, 2019

douwekiela commented Feb 13, 2019

mrshenli commented Feb 13, 2019 •

edited

douwekiela commented Feb 13, 2019

mrshenli commented Feb 13, 2019

ssnl commented Feb 14, 2019

Allow DataParallel to wrap CPU modules #17065

Allow DataParallel to wrap CPU modules #17065

Comments

mrshenli commented Feb 13, 2019

🚀 Feature

Motivation

Pitch

mrshenli commented Feb 13, 2019

ssnl commented Feb 13, 2019 • edited

mrshenli commented Feb 13, 2019

ssnl commented Feb 13, 2019

douwekiela commented Feb 13, 2019

mrshenli commented Feb 13, 2019 • edited

douwekiela commented Feb 13, 2019

mrshenli commented Feb 13, 2019

ssnl commented Feb 14, 2019

ssnl commented Feb 13, 2019 •

edited

mrshenli commented Feb 13, 2019 •

edited