-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nn.DataParallel compatibility in PyTorch 1.5 #4300
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4300 +/- ##
==========================================
- Coverage 78.21% 78.21% -0.01%
==========================================
Files 120 120
Lines 20038 20040 +2
==========================================
+ Hits 15673 15674 +1
- Misses 4365 4366 +1
Continue to review full report at Codecov.
|
Update: Also asked about this on shared PyTorch Slack channel |
I don't see anything obviously wrong, but I'm not very familiar with your codebase. You are looking for tensor attributes - are you sure that you don't have any other tensor attributes that don't correspond to former parameters? |
@@ -110,11 +110,31 @@ def reset_memory_hooks_state(self): | |||
|
|||
@property | |||
def device(self) -> device: | |||
return next(self.parameters()).device | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a great idea to have this as a property here! :-)
@@ -41,7 +41,7 @@ class CTRLModelTester(object): | |||
def __init__( | |||
self, | |||
parent, | |||
batch_size=13, | |||
batch_size=14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need an even number of batch_size
? Or is it because it's an unlucky number? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for some (but not all for some reason) of the models, the batch size seems to need to be a multiple of the number of DataParallel replicas.
I didn’t investigate too much as to why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think you forgot GPT-2: |
Yes, if I remember correctly I didn't try to remove all the calls to |
The same type of errors as in huggingface#4300
The same type of errors as in #4300
The same type of errors as in huggingface#4300
As reported in #3936
PyTorch 1.5 changes the handling of nn.Parameters in
DataParallel
replicas. (pytorch/pytorch#33907). The replicas now don't have parameters() anymore.This PR updates our
self.device
andself.dtype
helpers to mimicnn.Module
'sparameters()
helper but for attributes, i.e. recursively look for an attribute of type Tensor.Reformer and TransfoXL seem to be doing fancy things based on the module's Parameters so I didn't attempt to fix them.
Finally I'm introducing a
multigpu
CI flag. CI does not currently run on multiple GPUs so remember to run it locally.Also pinging @ngimel the author of the change in PyTorch, to check if I'm doing something stupid here.