-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts #16670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed) casts Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to `VectorUCastNode` by C2. This patch adds backend implementation for VectorUCastNode on AArch64. The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below: Benchmark Before After Units Gain VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70 VectorZeroExtend.byte2Long 3212.201 216291.588 ops/ms 66.33 VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85 VectorZeroExtend.int2Long 1012.197 80448.553 ops/ms 78.48 VectorZeroExtend.short2Int 1812.471 153416.828 ops/ms 83.65 VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58 On other Neon systems, we can get similar performance boost as a result of intrinsification success. Since `VectorUCastNode` only used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages. [TEST] compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines. [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726 Change-Id: I10770759f158975ead1eecd3fb63280e563ed5e2
|
👋 Welcome back eliu! A progress list of the required criteria for merging this PR into |
Webrevs
|
| %} | ||
| ins_pipe(pipe_slow); | ||
| %} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following hunk does not seem to be making good use of the macro processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated m4 file. Please help to review, thanks!
| // 4B/8B to 4S/8S | ||
| assert(dst_vlen_in_bytes == 8 || dst_vlen_in_bytes == 16, "unsupported"); | ||
| sxtl(dst, T8H, src, T8B); | ||
| (this->*ext)(dst, T8H, src, T8B); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing might make this cleaner: I suggest you make _xshll protected rather than private, then here
_xshll(is_unsigned, dst, T8H, src, T8B, 0);
| case S: | ||
| sve_sunpklo(dst, H, src); | ||
| sve_sunpklo(dst, S, dst); | ||
| (this->*unpklo)(dst, H, src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AS above: try making is_unsigned a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will fix it soon. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiler/vectorapi and jdk/incubator/vector passed. Full test is running. I would report the result when it has been finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full jtreg passed without new failure.
Change-Id: Ic19836feb8a73ea7e65443794f2a0eb1363f6e2f
| // Signed unpack and extend half of vector - low half | ||
| void sve_sunpklo(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn) { | ||
| _sve_xunpk(/* is_unsigned */ false, /* is_high */ false, Zd, T, Zn); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code expansion does not look right. You should be able to make this change without so much code expansion.
#define INSN(NAME, unsigned, high)
void name(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn) { \
_sve_xunpk(unsigned, high, T, Zn); \
}
INSN(sve_uunpkhi, true, true) ...
| _sve_xunpk(is_unsigned, /* is_high */ false, dst, H, src); | ||
| _sve_xunpk(is_unsigned, /* is_high */ false, dst, S, dst); | ||
| _sve_xunpk(is_unsigned, /* is_high */ false, dst, D, dst); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change here? It doesn't do anything. Does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_unsigned is also used in this function. It is used in VectorUCastNode for zero extending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
|
@theRealAph Could you help to take a look? Thanks. |
theRealAph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks nice. Thanks.
|
@e1iu This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 316 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
XiaohongGong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
/integrate |
|
Going to push as commit 9b8eaa2.
Your commit was automatically rebased without conflicts. |
Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to
VectorUCastNodeby C2. This patch adds backend implementation forVectorUCastNodeon AArch64.The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below:
On other Neon systems, we can get similar performance boost as a result of intrinsification success.
Since
VectorUCastNodeonly used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages.[TEST]
compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines.
[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16670/head:pull/16670$ git checkout pull/16670Update a local copy of the PR:
$ git checkout pull/16670$ git pull https://git.openjdk.org/jdk.git pull/16670/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16670View PR using the GUI difftool:
$ git pr show -t 16670Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16670.diff
Webrev
Link to Webrev Comment