-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Update torch.nn.init and torch.nn.utils.clip_grad #6173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pytorchbot test this please |
torch/nn/utils/clip_grad.py
Outdated
Gradients are modified in-place. | ||
Arguments: | ||
parameters (Iterable[Variable]): an iterable of Variables that will have |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some tests please?
torch/nn/utils/clip_grad.py
Outdated
clip_value (float or int): maximum allowed value of the gradients | ||
The gradients are clipped in the range [-clip_value, clip_value] | ||
""" | ||
parameters = list(filter(lambda p: p.grad is not None, parameters)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Might be good to have this confirm to the _ convention as it's an in-place operation. |
@pytorchbot test this please |
Well, there's already other clip functions which don't have a suffix _. If we change this one, the rest of them should change to (and also add the BC-compat code...) |
The only check that failed comes from "short-perf-test-cpu" which is unrelated to the pytorch tests. |
@pytorchbot retest this please |
Thanks @ezyang! |
I think it is reasonable to change the name to have suffix_. We did this for the init methods, and there are really just two grad clip methods including the one added in this PR... |
Good point @ssnl let's get it done. |
@@ -1,12 +1,12 @@ | |||
|
|||
def clip_grad_norm(parameters, max_norm, norm_type=2): | |||
def clip_grad_norm_(parameters, max_norm, norm_type=2): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/nn/utils/clip_grad.py
Outdated
""" | ||
warnings.warn("torch.nn.utils.clip_grad_norm is now deprecated in favor " | ||
"of torch.nn.utils.clip_grad_norm_.", | ||
category=DeprecationWarning, stacklevel=2) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
The tests are stuck even though their console output show them being done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM but @apaszke might want to take an extra look.
Sounds good @ssnl and thanks for reviewing these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good to go! Three minor things that could be improved
test/test_nn.py
Outdated
|
||
grads = torch.arange(-50, 50).view(10, 10).div(5), torch.ones(10).mul(2) | ||
for p, g in zip(l.parameters(), grads): | ||
p._grad = Variable(g.clone().view_as(p.data)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/nn/utils/clip_grad.py
Outdated
The gradients are clipped in the range [-clip_value, clip_value] | ||
""" | ||
clip_value = float(clip_value) | ||
for p in list(filter(lambda p: p.grad is not None, parameters)): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Hi @tonybeltramelli , are you planning to finish this PR soon? If not, I can fix the minor things for you :) |
@ssnl sorry for the delay! I just pushed these minor fixes. |
torch/nn/utils/clip_grad.py
Outdated
Total norm of the parameters (viewed as a single vector). | ||
""" | ||
parameters = list(filter(lambda p: p.grad is not None, parameters)) | ||
parameters = filter(lambda p: p.grad is not None, parameters) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_nn.py
Outdated
|
||
grads = torch.arange(-50, 50).view(10, 10).div(5), torch.ones(10).mul(2) | ||
for p, g in zip(l.parameters(), grads): | ||
p._grad = Variable(g.clone().view_as(p.data)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
clip_grad_value_(l.parameters(), clip_value) | ||
for p in filter(lambda p: p.grad is not None, l.parameters()): | ||
self.assertLessEqual(p.grad.data.max(), clip_value) | ||
self.assertGreaterEqual(p.grad.data.min(), -clip_value) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@tonybeltramelli the code looks good, but I'd really like to get rid of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apaszke doesn't want Variable in the commit
If we can merge this, I'll remove the Variable wrapper in #6641. |
@ssnl is planning to fix the Variable problem in a codemode, so this is OK to go in. |
@apaszke Thanks and sorry for keeping that @ssnl and @ezyang thank you and sorry for my unresponsive response time this week! |
@tonybeltramelli No worries. The tests are that way because we haven't gotten around to update them (fully). It was already quite some work for me to update part of those in #6641 ... |
@ssnl Makes total sense, pytorch is becoming a beast! :) |
My local tests on the fresh build are failing, I wonder how CI is passing
|
@ngimel My local test script imports fine.. Do you have prior binary installs that are not properly cleaned? |
Ah, right, I think it's prior install that I forgot to clean. Sorry for the noise. |
Introducing two updates. 1. Add param to He initialization scheme in torch.nn.init Problem solved: The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function. 2. Add util to clip gradient value in torch.nn.utils.clip_grad Problem solved: DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm. * add param to He initialization scheme in torch.nn.init * add util to clip gradient value in torch/nn/utils/clip_grad.py * update doc in torch.nn.utils.clip_grad * update and add test for torch.nn.utils.clip_grad * update function signature in torch.nn.utils.clip_grad to match suffix_ convention * ensure backward compatibility in torch.nn.utils.clip_grad * remove DeprecationWarning in torch.nn.utils.clip_grad * extend test and implementation of torch.nn.utils.clip_grad * update test and implementation torch.nn.utils.clip_grad
Introducing two updates.
1. Add param to He initialization scheme in torch.nn.init
Problem solved:
The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function.
2. Add util to clip gradient value in torch.nn.utils.clip_grad
Problem solved:
DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm.