-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8341414: Add support for FP16 conversion routines #1283
Conversation
This patch adds intrinsic support for FP16 conversion routines to int/long/double and also the aarch64 backend support. This patch implements both scalar and vector versions for these conversions. Performance numbers on aarch64 machine with SVE support : Benchmark (vectorDim) Gain Float16OpsBenchmark.fp16ToDouble 1024 18.23 Float16OpsBenchmark.fp16ToInt 1024 1.93 Float16OpsBenchmark.fp16ToLong 1024 3.95 The Gain column is the ratio between thrpt of this patch and the thrpt with the intrinsics disabled (which generates FP32 arithmetic).
👋 Welcome back bkilambi! A progress list of the required criteria for merging this PR into |
@Bhavana-Kilambi This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
@@ -649,6 +650,7 @@ public int intValue() { | |||
* @jls 5.1.3 Narrowing Primitive Conversion | |||
*/ | |||
@Override | |||
@IntrinsicCandidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be handled trough Idealization xform ?
ConvHF2F + ConvF2L => ConvHF2L
@@ -638,6 +638,7 @@ public short shortValue() { | |||
* @jls 5.1.3 Narrowing Primitive Conversion | |||
*/ | |||
@Override | |||
@IntrinsicCandidate | |||
public int intValue() { | |||
return (int)floatValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be handled trough Idealization xform ?
ConvHF2F + ConvF2I => ConvHF2I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matcher pattern may also suffice, but the problem will if ConvHF2F has multiple users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you check JDK mainline Float16 C2 compiler support PR, John has suggested us to go with pattern matching, I have added more details to that draft PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jatin, thanks for the review comments. I did go through the comments on your draft PR earlier but I wasn't sure if we would be following on their comments on the Valhalla branch as yet but I do agree with him about having too many intrinsics being an overkill. I'll remove the @IntrinsicCandidate
here and add Ideal transforms to do pattern matching in the mid-end.
@@ -679,6 +681,7 @@ public float floatValue() { | |||
* @jls 5.1.2 Widening Primitive Conversion | |||
*/ | |||
@Override | |||
@IntrinsicCandidate | |||
public double doubleValue() { | |||
return (double)floatValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be handled trough Idealization xform ?
ConvHF2F + ConvF2D => ConvHF2D
Hi @jatin-bhateja , I have uploaded a patch addressing your comments. Please review. |
Hi Jatin, I have added support for float16 to int and long. Apologies for missing the conversion to |
My bad, I meant the other way round i.e. integral to float16 conversion case, which takes a slow path route currently. Consider the following micro kernel:-
Here, the integer parameter is first converted to float16 value [a], valueOf routine first typecast integer value to double type and then passes it to Float16.valueOf(double) routine resulting in a bulky JIT sequence. We can outline the following code [c] into a new leaf routine returning a short value, and directly pass it to the Float16 constructor similar to https://github.com/openjdk/valhalla/blob/lworld%2Bfp16/src/java.base/share/classes/java/lang/Float16.java#L411 New routine can then be intrinsified to yield ConvI2HF IR, which then gets boxed as a value object. Since Float16 is a value type, it will scalarize its field accesses, thus directly forwarding HF ('short') value to subsequent ConvHF2F [b]. On mainline where Float16 is a value-based class we can bank on escape analysis to eliminate redundant boxing allocations.
We can spill this over to another patch if you suggest it, kindly let me know your views. Best Regards, |
Hi @jatin-bhateja , Thanks for the reminder. I remember asking you in a previous email about the reverse conversions and I forgot about that myself. |
I would prefer if I can do this in a separate patch please? I feel this patch is big enough. I will add some Ideal/Identity transformations as required for the new IR (for ex. ConvHF2I <-> ConvI2HF return the half float etc) in the new patch. |
Idea here is to avoid complexifying scalar intrinsic by delegating boxing to expander, otherwise we will also have to pass additional box type argument. Instead, we can rely explicit boxing happening in Java side and bank on escape analysis for its elimination thus directly exposing ConvI2HF to its user.
No, I am not suggesting to add <Primitive_Box_Type>.float16Value() API in existing primitive classes for time being, let Joe decide that. If you intrinsify leaf level wrapper routine, then we just need to plug that into Integer.float16Value(), we will lose this flexibility if we intrinsify Float16.valueOf(int). |
Sounds good! |
Thanks for the explanation. So from what I understand - we currently have four valueOf() routines in Float16.java, float16 to int/long/float/double. The valueOf(long) calls valueOf(float) inside it which contains an intrinsified routine already so we have ConvL2F -> ConvF2HF being generated for that. I can add an Ideal optimization to generate ConvL2HF for this sequence. For
This would probably generate a ConvI2D -> ConvD2HF which can be optimized to ConvI2HF. The same routine can be called in |
Can you also please review this patch? If it's all ok for you then this can be integrated and I can work on the next conv patch on top of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Bhavana-Kilambi ,
I have added few comments, patch looks good otherwise.
Best Regards,
Jatin
Thank you Jatin for your comments. I will address them in a new patch soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Bhavana-Kilambi , I am working on adding x86 backend support.
Kindly address the pending concerns in follow-up patch.
case T_FLOAT: __ flt16_to_flt(v0, r0, v1, T_FLOAT); break; | ||
case T_DOUBLE: __ flt16_to_flt(v0, r0, v1, T_DOUBLE); break; | ||
default: ShouldNotReachHere(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I re-visited this, so conversion stubs for constant folding call direct FP16 to INT/LONG/DOUBLE instructions.
This looks reasonable. Though constant folding is something that happens at compile time so it may not result in any runtime penalty even if we remove the stubs and directly cast to target type after hf2f stub conversion.
Hi @jatin-bhateja , I have addressed your comments. Please review. Thanks! |
Hi @jatin-bhateja , can I integrate these changes if you feel these are good to go? |
Hi @Bhavana-Kilambi , Kindly integrate this, you now have committer's rights :-) Best Regards, FTR, upcasting from float16 to float conversions using existing runtime helpers is precision preserving, and constant folding for newly introduced scalar IR can be performed by subsequent integral casting thereby avoiding the need for newly introduced helpers. This can be addressed in a follow-up patch, we can take this liberty on a project branch. |
Yes :) I just wanted to make sure you are ok with the changes before I integrate. So for constant folding we would do something like - convert half float value to float and then to integral/double right? But that would mean extra instructions in the backend when we have direct instructions to cast from half float to integral/double? |
/integrate |
Going to push as commit 6bcf899. |
@Bhavana-Kilambi Pushed as commit 6bcf899. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Since constants are folded at compile time, adding runtime helpers just for constant folding looks like an overkill. Post folding compile will directly operate over constant IR. Best Regards, |
This patch adds intrinsic support for FP16 conversion routines to int/long/double and also the aarch64 backend support. This patch implements both scalar and vector versions for these conversions.
Performance numbers on aarch64 machine with SVE support :
The Gain column is the ratio between thrpt of this patch and the thrpt with the intrinsics disabled (which generates FP32 arithmetic).
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1283/head:pull/1283
$ git checkout pull/1283
Update a local copy of the PR:
$ git checkout pull/1283
$ git pull https://git.openjdk.org/valhalla.git pull/1283/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1283
View PR using the GUI difftool:
$ git pr show -t 1283
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1283.diff
Using Webrev
Link to Webrev Comment