New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add forward AD formulas for {batch,layer,group}_norm #70355
Conversation
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 5dd950a (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
[ghstack-poisoned]
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There are some small optimizations we want to do though.
auto view_size = input_t.sizes().vec(); | ||
int64_t numel = 1; | ||
for (const auto dim : c10::irange(view_size.size())) { | ||
if (dim != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batchnorm doesn't support no-batch dim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at #60585
- yes, batch norm semantically is incompatible with no-batch dim
- for group norm it is BC-breaking, so a potential plan seems to be to create a new
nn.Module
entirely that would handle the no batch dim case - layer norm should be fine as long as we handle the case where normalized shape has the same dimension as input. I do see a test case in OpInfo handling this so we should be ok
cc @jbschlosser
if (train) { | ||
mean_p = saved_mean.view(view_size); | ||
invstd_p = saved_invstd.view(view_size); | ||
auto invstd_t = -invstd_p.pow(3) * ((input_t - input_t.mean(dims, true)) * (input_p - mean_p)).sum(dims, true) / numel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few things that could be done inplace here.
I think we want to do these. Keeping in mind the "composite compliance"
cc @zou3519 is there a doc/wiki/inline comment I can link here that states the rules of what is allowed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md#compositeimplicitautograd-compliance , but it doesn't include the information about the in-place yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Should we have a wiki or doc note that points to these?
[ghstack-poisoned]
ghstack-source-id: d62dba549cfd5334b9cdeca52689527dd130f8ee Pull Request resolved: #70355
[ghstack-poisoned]
ghstack-source-id: 58ba0625a82db546ff3879e388f7d5fe6c296075 Pull Request resolved: #70355
[ghstack-poisoned]
ghstack-source-id: cfa1e4c7408291495fd9337886c5acce6c938aab Pull Request resolved: #70355
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D33405362](https://our.internmc.facebook.com/intern/diff/D33405362) [ghstack-poisoned]
Differential Revision: [D33405362](https://our.internmc.facebook.com/intern/diff/D33405362) [ghstack-poisoned]
@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D33405362](https://our.internmc.facebook.com/intern/diff/D33405362) [ghstack-poisoned]
@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D33405362](https://our.internmc.facebook.com/intern/diff/D33405362) [ghstack-poisoned]
@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update looks good!
Differential Revision: [D33405362](https://our.internmc.facebook.com/intern/diff/D33405362) [ghstack-poisoned]
@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Stack from ghstack:
Differential Revision: D33405362