Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Feb 22, 2024

By hiding float32 constructors and exposing float16 ones. This allows compiler do implicit conversions as needed, and in safe cases optimize out unneeded upcasts to fp32, see example below

#include <arm_neon.h>

#ifndef __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
#error Ieeee
#endif

float16_t sum1(float16_t x, float16_t y) {
    return x + y;
}

float16_t sum2(float16_t x, float16_t y) {
    return static_cast<float>(x) + static_cast<float>(y);
}

both sum variants are compiled to scalar fp16 add, if build for the platform that supports fp16 arithmetic

sum1(half, half):                            // @sum1(half, half)
        fadd    h0, h0, h1
        ret
sum2(half, half):                            // @sum2(half, half)
        fadd    h0, h0, h1
        ret

Fixes build error in some aarch64 configurations after #119483 which are defined as supporting FP16 but don't define _Float16.

cc @snadampal

And let compiler do implicit conversions as needed
Copy link

pytorch-bot bot commented Feb 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120425

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 734f0b3 with merge base 65627cf (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@mikekgfb mikekgfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@malfet malfet added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Feb 22, 2024
@malfet malfet added the topic: not user facing topic category label Feb 22, 2024
@atalman
Copy link
Contributor

atalman commented Feb 22, 2024

@malfet Since #119483 was reverted should we merge both PRs, test and land them together ?

@malfet
Copy link
Contributor Author

malfet commented Feb 22, 2024

@malfet Since #119483 was reverted should we merge both PRs, test and land them together ?

If this can be landed, it potentially achieves the same results as the other PR, but with fewer lines of code

@malfet malfet added ciflow/mps Run MPS tests (subset of trunk) module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 labels Feb 23, 2024
@malfet
Copy link
Contributor Author

malfet commented Feb 23, 2024

@snadampal FYI

@malfet malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 23, 2024
@snadampal snadampal self-requested a review February 23, 2024 03:34
@malfet
Copy link
Contributor Author

malfet commented Feb 23, 2024

@pytorchbot merge -i

@snadampal
Copy link
Collaborator

Looks good to me!

If I understand correctly, this is an extension to the fp16<->fp32 acceleration feature added as part of this PR. now it's reusing the same via Half operator. isn't it?

I'm wondering what the major use cases for fp16 (half) datatype kernels are.

@pytorchmergebot
Copy link
Collaborator

@malfet malfet deleted the malfet/construct-Half-from-float16-on-armv8 branch February 23, 2024 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants