-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Construct c10::Half
from float16_t
on ARMv8
#120425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
And let compiler do implicit conversions as needed
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120425
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (5 Unrelated Failures)As of commit 734f0b3 with merge base 65627cf ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@snadampal FYI |
@pytorchbot merge -i |
Looks good to me! If I understand correctly, this is an extension to the fp16<->fp32 acceleration feature added as part of this PR. now it's reusing the same via I'm wondering what the major use cases for fp16 (half) datatype kernels are. |
Merge startedYour change will be merged while ignoring the following 5 checks: linux-binary-manywheel / manywheel-py3_12-cpu-cxx11-abi-build / build, linux-binary-manywheel / manywheel-py3_11-cpu-cxx11-abi-build / build, linux-binary-manywheel / manywheel-py3_8-cpu-cxx11-abi-build / build, linux-binary-manywheel / manywheel-py3_9-cpu-cxx11-abi-build / build, linux-binary-manywheel / manywheel-py3_10-cpu-cxx11-abi-build / build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
By hiding float32 constructors and exposing float16 ones. This allows compiler do implicit conversions as needed, and in safe cases optimize out unneeded upcasts to fp32, see example below
both sum variants are compiled to scalar fp16 add, if build for the platform that supports fp16 arithmetic
Fixes build error in some aarch64 configurations after #119483 which are defined as supporting FP16 but don't define _Float16.
cc @snadampal