-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8345125: Aarch64: Add aarch64 backend for Float16 scalar operations #23748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max.
|
👋 Welcome back bkilambi! A progress list of the required criteria for merging this PR into |
|
@Bhavana-Kilambi This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 18 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @shqking) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
@Bhavana-Kilambi The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
| 0xff03, 0xfffe] | ||
| 0x7e0, 0xfc0, 0x1f80, 0x3ff0, 0x7e00, 0x8000, | ||
| 0x81ff, 0xc1ff, 0xc003, 0xc7ff, 0xdfff, 0xe03f, | ||
| 0xe1ff, 0xf801, 0xfc00, 0xfc07, 0xff03, 0xfffe] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here you've deleted the duplicated 0x7e00 (good) but also the not-duplicated 0xe10f. Is 0xe10f not valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, yes 0xe10f does not seem to be valid. While I tried generating the asmtest.out.h I ran into errors with this value -
aarch64ops.s:1105: Error: immediate out of range at operand 3 -- eor z6.h,z6.h,#0xe10f
aarch64ops.s:1123: Error: immediate out of range at operand 3 -- eor z3.h,z3.h,#0xe10f
So I looked it up here - https://gist.github.com/dinfuehr/51a01ac58c0b23e4de9aac313ed6a06a to see if this number is a legal immediate and looks like it isn't. Maybe it's just chance that this number wasn't generated before as an immediate operand and these errors didn't show up till now.
|
Overall, this looks like a great pice of work. I only have a few changes in comments and a question, then we're good to go. |
| INSN(fnmuld, 0b000, 0b01, 0b100010, 0b1); | ||
|
|
||
| // Half-precision floating-point instructions | ||
| INSN(fabdh, 0b011, 0b11, 0b000101, 0b0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose fadbh and fnmulh are added to keep aligned with the float and double ones, i.e. fabd(s|d) and fnmul(s|d).
I noticed that there are matching rules for fabd(s|d), i.e. absd(F|D)_reg. I wonder if we need add the corresponding rule for fp16 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shqking , thanks for your review comments. Yes I added fabdh and fnmulh to keep aligned with float and double types.
For adding support for FP16 absd we need AbsHF to be supported (along with SubHF) but AbsHF node is not implemented currently. abs operation is directly executed from the java code here -
| public static Float16 abs(Float16 f16) { |
negate operation for FP16 - | return shortBitsToFloat16((short)(f16.value ^ (short)0x0000_8000)); |
On the Valhalla repo, while these operation were being developed, I tried adding support for
AbsHF/NegHF which emitted fabs and fneg instructions but the performance with the direct java code(bit manipulation operations) was much faster (sorry don't remember the exact number) so we decided to go with the java implementation instead.I still added
fabd here because op21 is 0 only in fabd H variant and felt that it'd be better to handle it here as it belongs to this group of instructions. Please let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the RM, fabd is in Advanced SIMD scalar three same FP16, but the rest are in Floating-point data-processing (2 source). The decoding scheme looks rather different.fabd, then, doesn't really fit here, but in a section with the rest of the three same FP16 instructions.
The encoding scheme for Advanced SIMD scalar three same FP16 is pretty simple, so I suggest you create a new group for them, and put fabd in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bhavana-Kilambi Thanks for your explanation for the missing AbsHF. It's okay to me to have fadbh and fnmulh in this patch.
Overall it's good to me except aph's comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @theRealAph Thanks again for the review and apologies for the delay in responding.
I moved the three fabd instructions out of their current place and added them in two separate sections - one for the single and double precision (Advanced SIMD scalar three same) and another for FP16 (Advanced SIMD scalar three same FP16). Please review the changes. Thank you!
|
@Bhavana-Kilambi This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
Hello @shqking @theRealAph , sincere apologies for the delay in addressing the review comments. I am planning on uploading a patch soon addressing all review comments. Thank you ! |
|
Hello, I would not be able to respond to comments until the next couple months or so due to some urgent tasks at work. Until then, I'd move this PR to draft status so that it would not be closed due to lack of activity. Thank you for the review! |
|
@Bhavana-Kilambi this pull request can not be integrated into git checkout JDK-8345125
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
Can you please uncomment the following tests using aarch64 float16 in this PR? |
Done. Thanks for notifying. |
|
Hi @Bhavana-Kilambi, I noticed there exists inconsistency between |
Thanks. I think I missed generating the |
|
Hi @shqking I have regenerated the |
theRealAph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Please summarize the tests that you've run before committing.
Thank you for the approval. |
shqking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update. Looks good to me.
|
On 4/25/25 10:17, bkilambi wrote:
Bhavana-Kilambi left a comment (openjdk/jdk#23748)
> Looks good to me. Please summarize the tests that you've run before committing.
Thank you for the approval.
All hotspot (hotspot_all), jdk (tiers 1-3) and langtools (tier1) pass on N1, V1 and V2 architectures.
But do any of these tests use Float16 ?
…--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
|
Yes they do. Please let me know if you'd like me to test anything else. Thanks! |
|
Hi @theRealAph , is it ok if I integrate this or would you like me to do any other testing on the patch? |
As long as you've got test coverage for everything here, go ahead. |
|
Thanks a lot! Can I ask you to please sponsor this patch? |
|
@Bhavana-Kilambi |
|
/sponsor |
|
Going to push as commit 3140de4.
Your commit was automatically rebased without conflicts. |
|
@shqking @Bhavana-Kilambi Pushed as commit 3140de4. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
|
This caused a regression: JDK-8355708 |
This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23748/head:pull/23748$ git checkout pull/23748Update a local copy of the PR:
$ git checkout pull/23748$ git pull https://git.openjdk.org/jdk.git pull/23748/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 23748View PR using the GUI difftool:
$ git pr show -t 23748Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23748.diff
Using Webrev
Link to Webrev Comment