Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264469: Add Insert float nodes implementation for Arm SVE #56

Closed

Conversation

Wanghuang-Huawei
Copy link
Collaborator

@Wanghuang-Huawei Wanghuang-Huawei commented Mar 31, 2021

  • Add Insert float nodes implementation for Arm SVE, like insertD & insertF
  • add fast path by using cmpeq (SVE compare vector with immediate). For the range limit of imm5 is [-16, 15], I shift the index range from [0, 31] to [-16, 15].

Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8264469: Add Insert float nodes implementation for Arm SVE

Reviewers

Contributors

  • Wang Huang <whuang@openjdk.org>
  • Ai Jiaming <aijiaming1@huawei.com>
  • He Xuejin <hexuejin2@huawei.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/56/head:pull/56
$ git checkout pull/56

Update a local copy of the PR:
$ git checkout pull/56
$ git pull https://git.openjdk.java.net/panama-vector pull/56/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 56

View PR using the GUI difftool:
$ git pr show -t 56

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/56.diff

@Wanghuang-Huawei
Copy link
Collaborator Author

/contributor add Wang Huang whuang@openjdk.org
/contributor add Ai Jiaming aijiaming1@huawei.com
/contributor add He Xuejin hexuejin2@huawei.com

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 31, 2021

👋 Welcome back whuang! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Mar 31, 2021
@openjdk
Copy link

openjdk bot commented Mar 31, 2021

@Wanghuang-Huawei
Contributor Wang Huang <whuang@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Mar 31, 2021

@Wanghuang-Huawei
Contributor Ai Jiaming <aijiaming1@huawei.com> successfully added.

@openjdk
Copy link

openjdk bot commented Mar 31, 2021

@Wanghuang-Huawei
Contributor He Xuejin <hexuejin2@huawei.com> successfully added.

@mlbridge
Copy link

mlbridge bot commented Mar 31, 2021

Webrevs

@openjdk
Copy link

openjdk bot commented Mar 31, 2021

@Wanghuang-Huawei This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264469: Add Insert float nodes implementation for Arm SVE

Co-authored-by: Wang Huang <whuang@openjdk.org>
Co-authored-by: Ai Jiaming <aijiaming1@huawei.com>
Co-authored-by: He Xuejin <hexuejin2@huawei.com>
Reviewed-by: njian, xgong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 453 new commits pushed to the vectorIntrinsics branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the vectorIntrinsics branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Mar 31, 2021
Comment on lines 3322 to 3326
Assembler::SIMD_RegVariant size =
elemType_to_regVariant(vector_element_basic_type(this));
__ sve_index(as_FloatRegister($tmp$$reg), __ S, 0, 1);
__ sve_dup(as_FloatRegister($tmp2$$reg), __ S, (int)($idx$$constant));
__ sve_cmpeq(as_PRegister($pTmp$$reg), size, ptrue,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use size instead of __ S all through the codes ? Or remove line 3322, and use __ S.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this patch has separated rules for different types, so removing line 3322 is better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will remove line 3322 and use __ S in cmpeq.

predicate(UseSVE > 0 &&
n->bottom_type()->is_vect()->element_basic_type() == T_FLOAT);
match(Set dst (VectorInsert (Binary src val) idx));
effect(TEMP tmp, TEMP tmp2, TEMP pTmp, KILL cr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use tmp1, tmp2 instead of tmp, tmp2 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Comment on lines 3325 to 3326
__ sve_dup(as_FloatRegister($tmp2$$reg), __ S, (int)($idx$$constant));
__ sve_cmpeq(as_PRegister($pTmp$$reg), size, ptrue,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVE support immedicate CMP: CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>. Could you please use the imm version here? We can save one dup instruction here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVE support immedicate CMP: CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>. Could you please use the imm version here? We can save one dup instruction here.

I have remembered why I choose this cmpeq. If you use immedicate CMP, the # <imm> is in range [-16,15]. I think we should use cmpeq in my codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so a vector CMP is needed here. Thanks for the explanation!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can check whether idx is in CMPEQ imm range to generate different code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can check whether idx is in CMPEQ imm range to generate different code?

I add a fast path by using CMPEQ imm.

I shift the range of idx from [0,31] to [-16,15] to suit the range limit of imm5.

Comment on lines 3316 to 3317
format %{ "sve_index $tmp, S, 0, 1\n\t"
"sve_dup $tmp2, S, $idx\n\t"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is (F) at the end of the format, is the type info S really needed here? I just think it's inconsistent with the following formats which do not have the type info inside.

Comment on lines 1922 to 1923
VECTOR_INSERT(S, iRegIorL2I, H, Register)
VECTOR_INSERT(I, iRegIorL2I, S, Register)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that I didn't look at the implementation PR for integer type. Is it better to merge the rules for "B/H/S" as a single one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that I didn't look at the implementation PR for integer type. Is it better to merge the rules for "B/H/S" as a single one?

Agree. I plan to do some cleanup work before integrating to jdk mainline, which will include such kind of merging. If @Wanghuang-Huawei can do it in this patch, that would help to reduce future refactoring work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea. However, we can not comment the type info after merging, so I choose this implementation.

%{
predicate(UseSVE > 0 &&
predicate(UseSVE > 0 && n->as_Vector()->length() > 32 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have more than 32 doubles/longs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is useless rule now. I'll remove it.

ins_pipe(pipe_slow);
%}

instruct insertL_fast(vReg dst, vReg src, iRegL val, immI idx, vReg tmp, pRegGov pTmp, rFlagsReg cr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the n->as_Vector()->length() >32 doesn't exist, is insertL better here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is considered that all concise rules (which is named fast now ) should have the common macro , otherwise we may use too many macros.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe a separate macro for L/D is better.


instruct insertL_fast(vReg dst, vReg src, iRegL val, immI idx, vReg tmp, pRegGov pTmp, rFlagsReg cr)
%{
predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can n->as_Vector()->length() <= 32 be removed? Or maybe an assertion is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes . The reason is the same as last comment. #56 (comment)
Or we COULD split insertL and insertD from other ones ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. Agree to split L/D from others.

@XiaohongGong
Copy link
Collaborator

Looks good to me, thanks!

@Wanghuang-Huawei
Copy link
Collaborator Author

/integrate

@Wanghuang-Huawei
Copy link
Collaborator Author

Looks good to me, thanks!

Thank you for your review.

@openjdk openjdk bot added the sponsor label Apr 6, 2021
@openjdk
Copy link

openjdk bot commented Apr 6, 2021

@Wanghuang-Huawei
Your change (at version b471c94) is now ready to be sponsored by a Committer.

@nsjian
Copy link
Collaborator

nsjian commented Apr 6, 2021

Looks good to me, but I would like to defer the integration, until @sviswa7 integrates #58, in case of any conflict.

@nsjian
Copy link
Collaborator

nsjian commented Apr 7, 2021

/sponsor

@openjdk
Copy link

openjdk bot commented Apr 7, 2021

@nsjian This change does not need sponsoring - the author is allowed to integrate it.

@Wanghuang-Huawei
Copy link
Collaborator Author

/integrate

@openjdk openjdk bot closed this Apr 7, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Apr 7, 2021
@openjdk
Copy link

openjdk bot commented Apr 7, 2021

@Wanghuang-Huawei Since your change was applied there have been 453 commits pushed to the vectorIntrinsics branch:

Your commit was automatically rebased without conflicts.

Pushed as commit b6a5e16.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants