Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8341414: Add support for FP16 conversion routines #1283

Closed

Conversation

Bhavana-Kilambi
Copy link
Contributor

@Bhavana-Kilambi Bhavana-Kilambi commented Oct 23, 2024

This patch adds intrinsic support for FP16 conversion routines to int/long/double and also the aarch64 backend support. This patch implements both scalar and vector versions for these conversions.

Performance numbers on aarch64 machine with SVE support :

Benchmark                         (vectorDim)   Gain
Float16OpsBenchmark.fp16ToDouble  1024          18.23
Float16OpsBenchmark.fp16ToInt     1024          1.93
Float16OpsBenchmark.fp16ToLong    1024          3.95

The Gain column is the ratio between thrpt of this patch and the thrpt with the intrinsics disabled (which generates FP32 arithmetic).


Progress

  • Change must not contain extraneous whitespace

Issue

  • JDK-8341414: Add support for FP16 conversion routines (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1283/head:pull/1283
$ git checkout pull/1283

Update a local copy of the PR:
$ git checkout pull/1283
$ git pull https://git.openjdk.org/valhalla.git pull/1283/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1283

View PR using the GUI difftool:
$ git pr show -t 1283

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1283.diff

Using Webrev

Link to Webrev Comment

This patch adds intrinsic support for FP16 conversion routines to
int/long/double and also the aarch64 backend support. This patch
implements both scalar and vector versions for these conversions.

Performance numbers on aarch64 machine with SVE support :

Benchmark                         (vectorDim)   Gain
Float16OpsBenchmark.fp16ToDouble  1024          18.23
Float16OpsBenchmark.fp16ToInt     1024          1.93
Float16OpsBenchmark.fp16ToLong    1024          3.95

The Gain column is the ratio between thrpt of this patch and the thrpt
with the intrinsics disabled (which generates FP32 arithmetic).
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 23, 2024

👋 Welcome back bkilambi! A progress list of the required criteria for merging this PR into lworld+fp16 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 23, 2024

@Bhavana-Kilambi This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8341414: Add support for FP16 conversion routines

Reviewed-by: jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the lworld+fp16 branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@mlbridge
Copy link

mlbridge bot commented Oct 23, 2024

Webrevs

@@ -649,6 +650,7 @@ public int intValue() {
* @jls 5.1.3 Narrowing Primitive Conversion
*/
@Override
@IntrinsicCandidate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be handled trough Idealization xform ?

ConvHF2F + ConvF2L => ConvHF2L

@@ -638,6 +638,7 @@ public short shortValue() {
* @jls 5.1.3 Narrowing Primitive Conversion
*/
@Override
@IntrinsicCandidate
public int intValue() {
return (int)floatValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be handled trough Idealization xform ?

ConvHF2F + ConvF2I => ConvHF2I

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matcher pattern may also suffice, but the problem will if ConvHF2F has multiple users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check JDK mainline Float16 C2 compiler support PR, John has suggested us to go with pattern matching, I have added more details to that draft PR.

Copy link
Contributor Author

@Bhavana-Kilambi Bhavana-Kilambi Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jatin, thanks for the review comments. I did go through the comments on your draft PR earlier but I wasn't sure if we would be following on their comments on the Valhalla branch as yet but I do agree with him about having too many intrinsics being an overkill. I'll remove the @IntrinsicCandidate here and add Ideal transforms to do pattern matching in the mid-end.

@@ -679,6 +681,7 @@ public float floatValue() {
* @jls 5.1.2 Widening Primitive Conversion
*/
@Override
@IntrinsicCandidate
public double doubleValue() {
return (double)floatValue();
Copy link
Member

@jatin-bhateja jatin-bhateja Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be handled trough Idealization xform ?

ConvHF2F + ConvF2D => ConvHF2D

@Bhavana-Kilambi
Copy link
Contributor Author

Hi @jatin-bhateja , I have uploaded a patch addressing your comments. Please review.

@Bhavana-Kilambi
Copy link
Contributor Author

Bhavana-Kilambi commented Nov 6, 2024

Hi Jatin, I have added support for float16 to int and long. Apologies for missing the conversion to short. Will add that. This patch adds support for fp16 -> double as well. fp16->float is already taken care of.

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Nov 7, 2024

My bad, I meant the other way round i.e. integral to float16 conversion case, which takes a slow path route currently. Consider the following micro kernel:-

public class float16_allocation {
   public static float micro(int value) {
       Float16 val = Float16.valueOf(value); // [a]
       return val.floatValue();              // [b]
   }

   public static void main(String [] args) {
       float res = 0.0f;
       for (int i = 0; i < 100000; i++) {
           res += micro(i);
       }
       System.out.println("[res]" + res);
   }
}

Here, the integer parameter is first converted to float16 value [a], valueOf routine first typecast integer value to double type and then passes it to Float16.valueOf(double) routine resulting in a bulky JIT sequence.

We can outline the following code [c] into a new leaf routine returning a short value, and directly pass it to the Float16 constructor similar to https://github.com/openjdk/valhalla/blob/lworld%2Bfp16/src/java.base/share/classes/java/lang/Float16.java#L411

New routine can then be intrinsified to yield ConvI2HF IR, which then gets boxed as a value object. Since Float16 is a value type, it will scalarize its field accesses, thus directly forwarding HF ('short') value to subsequent ConvHF2F [b]. On mainline where Float16 is a value-based class we can bank on escape analysis to eliminate redundant boxing allocations.

    public static Float16 valueOf(int value) {
        // int -> double conversion is exact
        return valueOf((double)value);     // [c] 
    }

We can spill this over to another patch if you suggest it, kindly let me know your views.

Best Regards,
Jatin

@Bhavana-Kilambi
Copy link
Contributor Author

Hi @jatin-bhateja , Thanks for the reminder. I remember asking you in a previous email about the reverse conversions and I forgot about that myself.
I am thinking if we have to intrinsify, can we not directly intrinsify Float16 valueOf() routines in Float16 itself instead of defining a new routine in Integer.java and then calling it in the Float16.valueOf() method and intrinsifying the one in Integer.java?

@Bhavana-Kilambi
Copy link
Contributor Author

I would prefer if I can do this in a separate patch please? I feel this patch is big enough. I will add some Ideal/Identity transformations as required for the new IR (for ex. ConvHF2I <-> ConvI2HF return the half float etc) in the new patch.

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Nov 8, 2024

Hi @jatin-bhateja , Thanks for the reminder. I remember asking you in a previous email about the reverse conversions and I forgot about that myself. I am thinking if we have to intrinsify, can we not directly intrinsify Float16 valueOf() routines in Float16

Idea here is to avoid complexifying scalar intrinsic by delegating boxing to expander, otherwise we will also have to pass additional box type argument. Instead, we can rely explicit boxing happening in Java side and bank on escape analysis for its elimination thus directly exposing ConvI2HF to its user.

new routine in Integer.java and then calling it in the Float16.valueOf() method and intrinsifying the one in Integer.java?

No, I am not suggesting to add <Primitive_Box_Type>.float16Value() API in existing primitive classes for time being, let Joe decide that. If you intrinsify leaf level wrapper routine, then we just need to plug that into Integer.float16Value(), we will lose this flexibility if we intrinsify Float16.valueOf(int).

@jatin-bhateja
Copy link
Member

I would prefer if I can do this in a separate patch please? I feel this patch is big enough. I will add some Ideal/Identity transformations as required for the new IR (for ex. ConvHF2I <-> ConvI2HF return the half float etc) in the new patch.

Sounds good!

@Bhavana-Kilambi
Copy link
Contributor Author

Bhavana-Kilambi commented Nov 8, 2024

No, I am not suggesting to add <Primitive_Box_Type>.float16Value() API in existing primitive classes for time being, let Joe decide that. If you intrinsify leaf level wrapper routine, then we just need to plug that into Integer.float16Value(), we will lose this flexibility if we intrinsify Float16.valueOf(int).

Thanks for the explanation. So from what I understand - we currently have four valueOf() routines in Float16.java, float16 to int/long/float/double. The valueOf(long) calls valueOf(float) inside it which contains an intrinsified routine already so we have ConvL2F -> ConvF2HF being generated for that. I can add an Ideal optimization to generate ConvL2HF for this sequence.

For valueOf(int) which calls valueOf(double) -> I will add a new leaf routine something like short d2s(double d) in Float16.java which I will intrinsify to generate ConvD2HF and this routine will be called in valueOf(int i) -

public static Float16 valueOf(int value) {
return Float16(d2hf((double) value);
}

This would probably generate a ConvI2D -> ConvD2HF which can be optimized to ConvI2HF. The same routine can be called in Float16 valueOf(double value) as well which should generate ConvD2HF. Is this ok?

@Bhavana-Kilambi
Copy link
Contributor Author

Can you also please review this patch? If it's all ok for you then this can be integrated and I can work on the next conv patch on top of this.

Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Bhavana-Kilambi ,
I have added few comments, patch looks good otherwise.

Best Regards,
Jatin

src/hotspot/share/opto/cfgnode.cpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/cfgnode.cpp Show resolved Hide resolved
@Bhavana-Kilambi
Copy link
Contributor Author

Thank you Jatin for your comments. I will address them in a new patch soon.

Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Bhavana-Kilambi , I am working on adding x86 backend support.
Kindly address the pending concerns in follow-up patch.

Comment on lines +5504 to +5507
case T_FLOAT: __ flt16_to_flt(v0, r0, v1, T_FLOAT); break;
case T_DOUBLE: __ flt16_to_flt(v0, r0, v1, T_DOUBLE); break;
default: ShouldNotReachHere();
}
Copy link
Member

@jatin-bhateja jatin-bhateja Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I re-visited this, so conversion stubs for constant folding call direct FP16 to INT/LONG/DOUBLE instructions.

This looks reasonable. Though constant folding is something that happens at compile time so it may not result in any runtime penalty even if we remove the stubs and directly cast to target type after hf2f stub conversion.

@Bhavana-Kilambi
Copy link
Contributor Author

Hi @jatin-bhateja , I have addressed your comments. Please review. Thanks!

@Bhavana-Kilambi
Copy link
Contributor Author

Hi @jatin-bhateja , can I integrate these changes if you feel these are good to go?

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Nov 26, 2024

Hi @jatin-bhateja , can I integrate these changes if you feel these are good to go?

Hi @Bhavana-Kilambi ,

Kindly integrate this, you now have committer's rights :-)

Best Regards,
Jatin

FTR, upcasting from float16 to float conversions using existing runtime helpers is precision preserving, and constant folding for newly introduced scalar IR can be performed by subsequent integral casting thereby avoiding the need for newly introduced helpers. This can be addressed in a follow-up patch, we can take this liberty on a project branch.

@Bhavana-Kilambi
Copy link
Contributor Author

Yes :) I just wanted to make sure you are ok with the changes before I integrate.

So for constant folding we would do something like - convert half float value to float and then to integral/double right? But that would mean extra instructions in the backend when we have direct instructions to cast from half float to integral/double?

@Bhavana-Kilambi
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 26, 2024

Going to push as commit 6bcf899.

@openjdk openjdk bot added the integrated label Nov 26, 2024
@openjdk openjdk bot closed this Nov 26, 2024
@openjdk
Copy link

openjdk bot commented Nov 26, 2024

@Bhavana-Kilambi Pushed as commit 6bcf899.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jatin-bhateja
Copy link
Member

Yes :) I just wanted to make sure you are ok with the changes before I integrate.

So for constant folding we would do something like - convert half float value to float and then to integral/double right? But that would mean extra instructions in the backend when we have direct instructions to cast from half float to integral/double?

Since constants are folded at compile time, adding runtime helpers just for constant folding looks like an overkill. Post folding compile will directly operate over constant IR.

Best Regards,
Jatin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants