New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8264469: Add Insert float nodes implementation for Arm SVE #56
8264469: Add Insert float nodes implementation for Arm SVE #56
Conversation
/contributor add Wang Huang whuang@openjdk.org |
👋 Welcome back whuang! A progress list of the required criteria for merging this PR into |
@Wanghuang-Huawei |
@Wanghuang-Huawei |
@Wanghuang-Huawei |
Webrevs
|
@Wanghuang-Huawei This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 453 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Assembler::SIMD_RegVariant size = | ||
elemType_to_regVariant(vector_element_basic_type(this)); | ||
__ sve_index(as_FloatRegister($tmp$$reg), __ S, 0, 1); | ||
__ sve_dup(as_FloatRegister($tmp2$$reg), __ S, (int)($idx$$constant)); | ||
__ sve_cmpeq(as_PRegister($pTmp$$reg), size, ptrue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use size
instead of __ S
all through the codes ? Or remove line 3322, and use __ S
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think this patch has separated rules for different types, so removing line 3322 is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will remove line 3322 and use __ S
in cmpeq
.
predicate(UseSVE > 0 && | ||
n->bottom_type()->is_vect()->element_basic_type() == T_FLOAT); | ||
match(Set dst (VectorInsert (Binary src val) idx)); | ||
effect(TEMP tmp, TEMP tmp2, TEMP pTmp, KILL cr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use tmp1, tmp2
instead of tmp, tmp2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
__ sve_dup(as_FloatRegister($tmp2$$reg), __ S, (int)($idx$$constant)); | ||
__ sve_cmpeq(as_PRegister($pTmp$$reg), size, ptrue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SVE support immedicate CMP: CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>
. Could you please use the imm version here? We can save one dup
instruction here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SVE support immedicate CMP:
CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>
. Could you please use the imm version here? We can save onedup
instruction here.
I have remembered why I choose this cmpeq
. If you use immedicate CMP
, the # <imm>
is in range [-16,15]
. I think we should use cmpeq
in my codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so a vector CMP is needed here. Thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can check whether idx is in CMPEQ imm range to generate different code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can check whether idx is in CMPEQ imm range to generate different code?
I add a fast path by using CMPEQ imm
.
I shift the range of idx from [0,31] to [-16,15] to suit the range limit of imm5
.
format %{ "sve_index $tmp, S, 0, 1\n\t" | ||
"sve_dup $tmp2, S, $idx\n\t" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is (F)
at the end of the format, is the type info S
really needed here? I just think it's inconsistent with the following formats which do not have the type info inside.
VECTOR_INSERT(S, iRegIorL2I, H, Register) | ||
VECTOR_INSERT(I, iRegIorL2I, S, Register) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry that I didn't look at the implementation PR for integer type. Is it better to merge the rules for "B/H/S" as a single one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry that I didn't look at the implementation PR for integer type. Is it better to merge the rules for "B/H/S" as a single one?
Agree. I plan to do some cleanup work before integrating to jdk mainline, which will include such kind of merging. If @Wanghuang-Huawei can do it in this patch, that would help to reduce future refactoring work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea. However, we can not comment the type info after merging, so I choose this implementation.
%{ | ||
predicate(UseSVE > 0 && | ||
predicate(UseSVE > 0 && n->as_Vector()->length() > 32 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have more than 32 doubles/longs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is useless rule now. I'll remove it.
ins_pipe(pipe_slow); | ||
%} | ||
|
||
instruct insertL_fast(vReg dst, vReg src, iRegL val, immI idx, vReg tmp, pRegGov pTmp, rFlagsReg cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the n->as_Vector()->length() >32
doesn't exist, is insertL
better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is considered that all concise
rules (which is named fast
now ) should have the common macro , otherwise we may use too many macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe a separate macro for L/D is better.
|
||
instruct insertL_fast(vReg dst, vReg src, iRegL val, immI idx, vReg tmp, pRegGov pTmp, rFlagsReg cr) | ||
%{ | ||
predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can n->as_Vector()->length() <= 32
be removed? Or maybe an assertion is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes . The reason is the same as last comment. #56 (comment)
Or we COULD split insertL and insertD from other ones ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. Agree to split L/D from others.
Looks good to me, thanks! |
/integrate |
Thank you for your review. |
@Wanghuang-Huawei |
/sponsor |
@nsjian This change does not need sponsoring - the author is allowed to integrate it. |
/integrate |
@Wanghuang-Huawei Since your change was applied there have been 453 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit b6a5e16. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
cmpeq
(SVE compare vector with immediate). For the range limit ofimm5
is [-16, 15], I shift the index range from [0, 31] to [-16, 15].Progress
Issue
Reviewers
Contributors
<whuang@openjdk.org>
<aijiaming1@huawei.com>
<hexuejin2@huawei.com>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/56/head:pull/56
$ git checkout pull/56
Update a local copy of the PR:
$ git checkout pull/56
$ git pull https://git.openjdk.java.net/panama-vector pull/56/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 56
View PR using the GUI difftool:
$ git pr show -t 56
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/56.diff