-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Added pow() on CPU for float16 & bfloat16 #50999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added pow() on CPU for float16 & bfloat16 #50999
Conversation
💊 CI failures summary and remediationsAs of commit 3457e35 (more details on the Dr. CI page):
4 failures not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@imaginary-person do me a favor and ping me tomorrow or Wednesday on this PR and I'll take a look. |
This comment has been minimized.
This comment has been minimized.
Codecov Report
@@ Coverage Diff @@
## master #50999 +/- ##
=======================================
Coverage 77.45% 77.46%
=======================================
Files 1894 1894
Lines 186403 186437 +34
=======================================
+ Hits 144374 144416 +42
+ Misses 42029 42021 -8 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
96e2d44
to
5c57e01
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @imaginary-person!
There's a lot of good stuff going on here. An OpInfo needs to be added for pow, and the legacy pow sample inputs removed from method_tests in common_methods_invocations.py. Implementing an OpInfo for pow will also let you remove those legacy test_torch.py pow sample inputs and not have to worry about updating them.
As for the pow function itself, can we really not simplify the implementation to avoid tripling its size? pow has been an extremely tricky function to get right, and I'm worried about making it even harder to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @anjali411 do you want to check the complex part before I land this?
Thanks for the great work on this PR @imaginary-person
@heitorschueroff, thanks a lot for your & @mruberry's enormous help & patience with this PR! |
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Removed scalar_t = decltype(c10::impl::ScalarTypeToCPPType<ScalarType::Half>::t), as the AT_DISPATCH_FLOATING_TYPES_AND macro does this assignment anyway.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@heitorschueroff merged this pull request in 6d030c1. |
@imaginary-person This was not a trivial task and the refactor + tests you added are a great contribution to the project. Thank you! |
This pull request has been reverted by 8377e62. |
Sorry for the inconvenience! EDIT: #54949 caught a bug in this PR. For 4 of the 9 bool sample inputs, the dtype wasn't actually bool! 😞 |
Get changes from main repo
This comment has been minimized.
This comment has been minimized.
Summary: #### Reason for relanding Line 1607 of `torch/testing/_internal/common_methods_invocations.py` of #50999 had `dtype` instead of `dtype=torch.bool`, so 4 of the 9 sample inputs for `bool` had incorrect dtype. This bug was caught by #54949. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `tan` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. 7. Removed redundant `dtypesIfCPU` and `dtypesIfCUDA` from `OpInfo`s where they are equal to `dtypes`. Pull Request resolved: #55280 Reviewed By: jbschlosser Differential Revision: D27591772 Pulled By: heitorschueroff fbshipit-source-id: c7420811b32595bb3353149a61e54a73f2eb352b
…orch#55280) Summary: #### Reason for relanding Line 1607 of `torch/testing/_internal/common_methods_invocations.py` of pytorch#50999 had `dtype` instead of `dtype=torch.bool`, so 4 of the 9 sample inputs for `bool` had incorrect dtype. This bug was caught by pytorch#54949. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `tan` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. 7. Removed redundant `dtypesIfCPU` and `dtypesIfCUDA` from `OpInfo`s where they are equal to `dtypes`. Pull Request resolved: pytorch#55280 Reviewed By: jbschlosser Differential Revision: D27591772 Pulled By: heitorschueroff fbshipit-source-id: c7420811b32595bb3353149a61e54a73f2eb352b
Added the functionality desired in #50789.
Summary
float16
(Half
) andbfloat16
types.Both
pow(Tensor, Scalar)
andpow(Tensor, Tensor)
are now supported for the aforementioned types.However autograd isn't supported for
Float16
on CPU yet, aslog_vml_cpu
can't be enabled for it.pow_tensor_scalar_optimized_kernel
to refactor & simplifyPowKernel.cpp
.It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types,
so PowKernel.cpp looks a lot cleaner now.
erf
,erfc
,erfinv
,linalg.norm
andlinalg.vector.norm
which were being skipped earlier due topow()
not having been implemented forfloat16
&bfloat16
.pow()
& enabled some test cases forpow()
.pow
intest_binary_ufuncs.py
in order to enable comparison withnumpy
, even with discontiguous tensors, and added a test to ensure that a runtime error is raised forpow
's inplace variant if resizing the base tensor is required during its invocation.float16
&bfloat16
tosquare
's dtype lists in itsUnaryUfuncInfo
.