-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Add forward AD inplace check and fix codegen #60498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 7ad9415 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Job | Step | Action |
---|---|---|
Run mypy | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
You can see codegened code here (outdated) which is the same commits but with the generated files checked in. |
PR body should contain more description about how the forward AD inplace check works and what fixes were applied. I think I know the gist from our discussions but a summary is nice and anyone who comes along in the future who finds this commit may be confused |
tools/autograd/gen_variable_type.py
Outdated
FW_DERIVATIVE_BOOLEAN_TEMPLATE = CodeTemplate("""\ | ||
auto ${boolean_var_name} = ${cond}; | ||
(void)${boolean_var_name}; | ||
""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to check that the compiler optimized away this code if it's not used?
auto _any_has_forward_grad_result = isFwGradDefined(self) || isFwGradDefined(other);
(void)_any_has_forward_grad_result;
I remember sometimes the compiler gets confused about if it's safe to optimize away things (e.g. if the compiler thinks isFwGradDefined has any side effects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From checking the assembly code, the calls to isFwGradDefined are not optimized out indeed.
I'll add the appropriate logic to avoid these calls for out= functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIP, I wish the compiler were smarter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be if they were inlined and only calling simple functions, but that's not the case here :/
[ghstack-poisoned]
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just have some small questions.
@@ -761,7 +762,7 @@ | |||
|
|||
- name: polygamma_(Tensor(a!) self, int n) -> Tensor(a!) | |||
self: grad * polygamma(n + 1, self) | |||
result: auto_element_wise | |||
result: self_t.mul_(polygamma(n + 1, original_self_p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be some conj
here? But I'm guessing that's okay because polygamma doesn't support complex
It also looks like we avoid doing mul by self_t out of place and then copy_
Are these optimizations that we could automate with auto_element_wise/auto_linear as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a grand total of ~10 functions that have formulas for the inplace version specifically. So I don't think we want to bother adding such a logic to these automation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point
What was is the reasoning behind having functions have formulas specifically for inplace anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few of them are because there is no out-of-place version.
A few of them is for perf to avoid input cloning by providing an alternative implementation that only makes use for the output.
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D29914593](https://our.internmc.facebook.com/intern/diff/D29914593) [ghstack-poisoned]
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review -- only had a comment on the comment but otherwise this lgtm as well
# 1) Validate the formula and make sure the input that is modified in not used: | ||
# - If there is a formula for the inplace variant of the function (is_exact_match == True) then | ||
# we make sure that the original value of the input that is being modified inplace (self_p) is | ||
# not used in the formula. Note that the formula can use "original_self_p" here and that would | ||
# trigger a clone of the original input. | ||
# - If we are re-using the out of place formula (is_exact_match == False) then we replace every | ||
# occurrence of self_p and self_t by original_self_p and original_self_t. These will be | ||
# populated by cloned version of the original input (either the clone done by the backward AD | ||
# logic if self is also used in a backward formula or a special clone that we add). | ||
# 2) At this point, there cannot be a self_p in the formula. | ||
# 3) Change "result" into "self_p" as by design, in the inplace function codegen, the result is | ||
# simply called self (as it is modified inplace). | ||
# 4) Update the required primals data in case it used to contain "result" but should now contain | ||
# "self" | ||
# 5) If it is not an exact match, the user formula is not modifying the existing forward grad | ||
# inplace as it should. So add some code that makes sure that we do so if the forward grad | ||
# already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: an example here would help, it's hard to read through a chunk of text without being able to visualize what happens
Stack from ghstack:
Differential Revision: D29914593