Skip to content

Add forward AD inplace check and fix codegen #60498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

albanD
Copy link
Collaborator

@albanD albanD commented Jun 23, 2021

Stack from ghstack:

Differential Revision: D29914593

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 23, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 7ad9415 (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_parallelnative_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 27 21:37:02 AssertionError: NotImplementedE...orward AD, you should set supports_forward_ad=True
Jul 27 21:37:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 814, in dep_fn
Jul 27 21:37:02     return fn(slf, *args, **kwargs)
Jul 27 21:37:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 814, in dep_fn
Jul 27 21:37:02     return fn(slf, *args, **kwargs)
Jul 27 21:37:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 780, in test_wrapper
Jul 27 21:37:02     return test(*args, **kwargs)
Jul 27 21:37:02   File "test_ops.py", line 657, in test_forward_mode_AD
Jul 27 21:37:02     self._forward_grad_helper(device, dtype, op, op.get_op())
Jul 27 21:37:02   File "test_ops.py", line 651, in _forward_grad_helper
Jul 27 21:37:02     self._grad_test_helper(device, dtype, op, variant, check_forward_ad=True)
Jul 27 21:37:02 AssertionError: NotImplementedError not raised : Running forward AD for an OP that has does not support it did not raise any error. If your op supports forward AD, you should set supports_forward_ad=True
Jul 27 21:37:02 
Jul 27 21:37:03 ----------------------------------------------------------------------
Jul 27 21:37:03 Ran 6411 tests in 926.881s
Jul 27 21:37:03 
Jul 27 21:37:03 FAILED (failures=2, skipped=2652)
Jul 27 21:37:03 
Jul 27 21:37:03 Generating XML reports...
Jul 27 21:37:03 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestCommonCPU-20210727212135.xml
Jul 27 21:37:03 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestGradientsCPU-20210727212135.xml
Jul 27 21:37:03 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestJitCPU-20210727212135.xml

See CircleCI build pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 27 21:50:27 AssertionError: NotImplementedE...orward AD, you should set supports_forward_ad=True
Jul 27 21:50:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 814, in dep_fn
Jul 27 21:50:27     return fn(slf, *args, **kwargs)
Jul 27 21:50:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 814, in dep_fn
Jul 27 21:50:27     return fn(slf, *args, **kwargs)
Jul 27 21:50:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 780, in test_wrapper
Jul 27 21:50:27     return test(*args, **kwargs)
Jul 27 21:50:27   File "test_ops.py", line 657, in test_forward_mode_AD
Jul 27 21:50:27     self._forward_grad_helper(device, dtype, op, op.get_op())
Jul 27 21:50:27   File "test_ops.py", line 651, in _forward_grad_helper
Jul 27 21:50:27     self._grad_test_helper(device, dtype, op, variant, check_forward_ad=True)
Jul 27 21:50:27 AssertionError: NotImplementedError not raised : Running forward AD for an OP that has does not support it did not raise any error. If your op supports forward AD, you should set supports_forward_ad=True
Jul 27 21:50:27 
Jul 27 21:50:28 ----------------------------------------------------------------------
Jul 27 21:50:28 Ran 6411 tests in 747.679s
Jul 27 21:50:28 
Jul 27 21:50:28 FAILED (failures=2, skipped=2652)
Jul 27 21:50:28 
Jul 27 21:50:28 Generating XML reports...
Jul 27 21:50:28 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestCommonCPU-20210727213800.xml
Jul 27 21:50:28 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestGradientsCPU-20210727213800.xml
Jul 27 21:50:28 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestJitCPU-20210727213800.xml

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Lint / mypy Run mypy 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@albanD
Copy link
Collaborator Author

albanD commented Jun 25, 2021

You can see codegened code here (outdated) which is the same commits but with the generated files checked in.

@zou3519
Copy link
Contributor

zou3519 commented Jun 29, 2021

PR body should contain more description about how the forward AD inplace check works and what fixes were applied. I think I know the gist from our discussions but a summary is nice and anyone who comes along in the future who finds this commit may be confused

Comment on lines 307 to 310
FW_DERIVATIVE_BOOLEAN_TEMPLATE = CodeTemplate("""\
auto ${boolean_var_name} = ${cond};
(void)${boolean_var_name};
""")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check that the compiler optimized away this code if it's not used?

auto _any_has_forward_grad_result = isFwGradDefined(self) || isFwGradDefined(other);
(void)_any_has_forward_grad_result;

I remember sometimes the compiler gets confused about if it's safe to optimize away things (e.g. if the compiler thinks isFwGradDefined has any side effects)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From checking the assembly code, the calls to isFwGradDefined are not optimized out indeed.
I'll add the appropriate logic to avoid these calls for out= functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIP, I wish the compiler were smarter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be if they were inlined and only calling simple functions, but that's not the case here :/

Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just have some small questions.

@@ -761,7 +762,7 @@

- name: polygamma_(Tensor(a!) self, int n) -> Tensor(a!)
self: grad * polygamma(n + 1, self)
result: auto_element_wise
result: self_t.mul_(polygamma(n + 1, original_self_p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be some conj here? But I'm guessing that's okay because polygamma doesn't support complex
It also looks like we avoid doing mul by self_t out of place and then copy_
Are these optimizations that we could automate with auto_element_wise/auto_linear as well?

Copy link
Collaborator Author

@albanD albanD Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a grand total of ~10 functions that have formulas for the inplace version specifically. So I don't think we want to bother adding such a logic to these automation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point
What was is the reasoning behind having functions have formulas specifically for inplace anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of them are because there is no out-of-place version.
A few of them is for perf to avoid input cloning by providing an alternative implementation that only makes use for the output.

@albanD
Copy link
Collaborator Author

albanD commented Jul 26, 2021

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@albanD
Copy link
Collaborator Author

albanD commented Jul 27, 2021

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@albanD merged this pull request in 4a36e2a.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review -- only had a comment on the comment but otherwise this lgtm as well

Comment on lines +269 to +285
# 1) Validate the formula and make sure the input that is modified in not used:
# - If there is a formula for the inplace variant of the function (is_exact_match == True) then
# we make sure that the original value of the input that is being modified inplace (self_p) is
# not used in the formula. Note that the formula can use "original_self_p" here and that would
# trigger a clone of the original input.
# - If we are re-using the out of place formula (is_exact_match == False) then we replace every
# occurrence of self_p and self_t by original_self_p and original_self_t. These will be
# populated by cloned version of the original input (either the clone done by the backward AD
# logic if self is also used in a backward formula or a special clone that we add).
# 2) At this point, there cannot be a self_p in the formula.
# 3) Change "result" into "self_p" as by design, in the inplace function codegen, the result is
# simply called self (as it is modified inplace).
# 4) Update the required primals data in case it used to contain "result" but should now contain
# "self"
# 5) If it is not an exact match, the user formula is not modifying the existing forward grad
# inplace as it should. So add some code that makes sure that we do so if the forward grad
# already exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an example here would help, it's hard to read through a chunk of text without being able to visualize what happens

@facebook-github-bot facebook-github-bot deleted the gh/albanD/104/head branch July 31, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants