Skip to content
This repository was archived by the owner on Sep 2, 2022. It is now read-only.

8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers #136

Closed
wants to merge 9 commits into from

Conversation

dgbo
Copy link
Member

@dgbo dgbo commented Jan 28, 2021

This is a typo introduced by JDK-8255949.
Compiler will generate ushr for shifting right and accumulating four short integers.
It produces wrong results for specific case. The instruction should be usra.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136
$ git checkout pull/136

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 28, 2021

👋 Welcome back dongbo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dgbo
Copy link
Member Author

dgbo commented Jan 28, 2021

/label add hotspot-dev

@openjdk openjdk bot added rfr Pull request is ready for review hotspot labels Jan 28, 2021
@openjdk
Copy link

openjdk bot commented Jan 28, 2021

@dgbo
The hotspot label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Jan 28, 2021

Webrevs

@vnkozlov
Copy link

/label add hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.java.net label Jan 29, 2021
@openjdk
Copy link

openjdk bot commented Jan 29, 2021

@vnkozlov
The hotspot-compiler label was successfully added.

@vnkozlov
Copy link

Someone familiar with Aarch64 assembler have to review this before it is approved for JDK 16.

@openjdk
Copy link

openjdk bot commented Jan 29, 2021

@dgbo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers

Reviewed-by: iveresov, dlong, njian, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 10 new commits pushed to the master branch:

  • 1a7040e: 8259794: Remove EA from JDK 16 version string starting with Initial RC promotion on Feb 04, 2021(B35)
  • afd5eef: 8260704: ParallelGC: oldgen expansion needs release-store for _end
  • 081fa3e: 8260927: StringBuilder::insert is incorrect without Compact Strings
  • ed1a775: 8258378: Final nroff manpage update for JDK 16
  • 21f8bf4: 8257215: JFR: Events dropped when streaming over a chunk rotation
  • 0fdf9cd: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled
  • bc41bb1: 8260632: Build failures after JDK-8253353
  • a117e11: 8260339: JVM crashes when executing PhaseIdealLoop::match_fill_loop
  • 8ffdbce: 8260608: add a regression test for 8260370
  • 1926765: 8253353: Crash in C2: guarantee(n != NULL) failed: No Node

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@veresov, @dean-long, @nsjian, @theRealAph) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 29, 2021
@dean-long
Copy link
Member

Why didn't the testing for JDK-8255949 catch this? Do you need to fix the regression test too?

@vnkozlov
Copy link

Yes, we need regression test for this fix. Or modify existing one to catch it.

@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 30, 2021
@dgbo
Copy link
Member Author

dgbo commented Jan 30, 2021

Did not run local tests for small loops in JDK-8255949.
Updated a test for all shift and accumulating operations which can catch this.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 30, 2021
@mlbridge
Copy link

mlbridge bot commented Jan 30, 2021

Mailing list message from Andrew Haley on hotspot-dev:

On 1/30/21 5:07 AM, Dong Bo wrote:

I don't understand. Looking at this:

instruct vsrla4S_imm(vecD dst, vecD src, immI shift) %{
predicate(n->as_Vector()->length() == 4);
match(Set dst (AddVS dst (URShiftVS src (RShiftCntV shift))));
ins_cost(INSN_COST);
format %{ "usra $dst, $src, $shift\t# vector (4H)" %}
ins_encode %{
int sh = (int)$shift$$constant;
if (sh >= 16) {
__ eor(as_FloatRegister($src$$reg), __ T8B,
as_FloatRegister($src$$reg),
as_FloatRegister($src$$reg));
} else {
__ usra(as_FloatRegister($dst$$reg), __ T4H,
as_FloatRegister($src$$reg), sh);
}
%}
ins_pipe(vshift64_imm);
%}

What happens when the shift is >= 16? What happens to src and dst?

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@dgbo
Copy link
Member Author

dgbo commented Jan 31, 2021

Mailing list message from Andrew Haley on hotspot-dev:

On 1/30/21 5:07 AM, Dong Bo wrote:

I don't understand. Looking at this:

instruct vsrla4S_imm(vecD dst, vecD src, immI shift) %{
predicate(n->as_Vector()->length() == 4);
match(Set dst (AddVS dst (URShiftVS src (RShiftCntV shift))));
ins_cost(INSN_COST);
format %{ "usra $dst, $src, $shift\t# vector (4H)" %}
ins_encode %{
int sh = (int)$shift$$constant;
if (sh >= 16) {
__ eor(as_FloatRegister($src$$reg), __ T8B,
as_FloatRegister($src$$reg),
as_FloatRegister($src$$reg));
} else {
__ usra(as_FloatRegister($dst$$reg), __ T4H,
as_FloatRegister($src$$reg), sh);
}
%}
ins_pipe(vshift64_imm);
%}

What happens when the shift is >= 16? What happens to src and dst?

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. https://www.redhat.com
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

This was wrong, both src and dst should have the same value as before.
Actually, when the shift is >= 16, the URShift is optimized to zero by the compiler.
So we don't have a vsrla4S_imm match if shift >= 16, the wrong eor is not generated.
Check the assembly code of the following test:

# test
public void shiftURightAccumulateChar() {
      for (int i = 0; i < count; i++) {
           charsD[i] = (char) (charsA[i] + (charsB[i] >>> 16));
      }
}
# assembly code, the `shift` is gone, only `move` left
1.17%  │   0x0000ffff88075348:   ldr  q16, [x14,#16]
         │   0x0000ffff8807534c:   add  x12, x19, x12
         │   0x0000ffff88075350:   str  q16, [x12,#16]
  1.66%  │   0x0000ffff88075354:   ldr  q16, [x14,#32]
         │   0x0000ffff88075358:   str  q16, [x12,#32]
  2.03%  │   0x0000ffff8807535c:   ldr  q16, [x14,#48]
         │   0x0000ffff88075360:   str  q16, [x12,#48]
  1.39%  │   0x0000ffff88075364:   ldr  q16, [x14,#64]
         │   0x0000ffff88075368:   str  q16, [x12,#64]

Copy link

@nsjian nsjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Feb 1, 2021
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review and removed rfr Pull request is ready for review labels Feb 1, 2021
@dgbo
Copy link
Member Author

dgbo commented Feb 2, 2021

Hi, Andrew.

The reason ssra is not generated with .8B form is that if loop size is 16, the vector length is not 8 but 4.
Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched.
We should fix this with the following code:

 instruct vsraa8B_imm(vecD dst, vecD src, immI shift) %{
-  predicate(n->as_Vector()->length() == 8);
+  predicate(n->as_Vector()->length() == 4 || n->as_Vector()->length() == 8);
   match(Set dst (AddVB dst (RShiftVB src (RShiftCntV shift))));
   ins_cost(INSN_COST);
   format %{ "ssra    $dst, $src, $shift\t# vector (8B)" %}
@@ -18782,7 +18782,7 @@ instruct vsraa16B_imm(vecX dst, vecX src, immI shift) %{
 %}

 instruct vsraa4S_imm(vecD dst, vecD src, immI shift) %{
-  predicate(n->as_Vector()->length() == 4);
+  predicate(n->as_Vector()->length() == 2 || n->as_Vector()->length() == 4);
   match(Set dst (AddVS dst (RShiftVS src (RShiftCntV shift))));
   ins_cost(INSN_COST);
   format %{ "ssra    $dst, $src, $shift\t# vector (4H)" %}
@@ -18849,7 +18849,7 @@ instruct vsraa2L_imm(vecX dst, vecX src, immI shift) %{
 %}

 instruct vsrla8B_imm(vecD dst, vecD src, immI shift) %{
-  predicate(n->as_Vector()->length() == 8);
+  predicate(n->as_Vector()->length() == 4 || n->as_Vector()->length() == 8);
   match(Set dst (AddVB dst (URShiftVB src (RShiftCntV shift))));
   ins_cost(INSN_COST);
   format %{ "usra    $dst, $src, $shift\t# vector (8B)" %}
@@ -18879,7 +18879,7 @@ instruct vsrla16B_imm(vecX dst, vecX src, immI shift) %{
 %}

 instruct vsrla4S_imm(vecD dst, vecD src, immI shift) %{
-  predicate(n->as_Vector()->length() == 4);
+  predicate(n->as_Vector()->length() == 2 || n->as_Vector()->length() == 4);
   match(Set dst (AddVS dst (URShiftVS src (RShiftCntV shift))));
   ins_cost(INSN_COST);
   format %{ "usra    $dst, $src, $shift\t# vector (4H)" %}

How do you think if we do this modification via this PR?

@nsjian
Copy link

nsjian commented Feb 2, 2021

Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched.
We should fix this with the following code:

I think this is an enhancement, and should be done in a separate patch in jdk mainline.

@dgbo
Copy link
Member Author

dgbo commented Feb 3, 2021

Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched.
We should fix this with the following code:

I think this is an enhancement, and should be done in a separate patch in jdk mainline.

OK, I update a test with loop size 80 for bytes so that ssra for 8B can be matched now.

@dgbo
Copy link
Member Author

dgbo commented Feb 3, 2021

Ping... Can I get a review for the newest changes? Please let me know if we are ready to go.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks.

@dgbo
Copy link
Member Author

dgbo commented Feb 3, 2021

Thank you all for the review.
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Feb 3, 2021
@openjdk
Copy link

openjdk bot commented Feb 3, 2021

@dgbo
Your change (at version 9e71e0f) is now ready to be sponsored by a Committer.

@dean-long
Copy link
Member

/sponsor

@openjdk openjdk bot closed this Feb 3, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 3, 2021
@openjdk
Copy link

openjdk bot commented Feb 3, 2021

@dean-long @dgbo Since your change was applied there have been 10 commits pushed to the master branch:

  • 1a7040e: 8259794: Remove EA from JDK 16 version string starting with Initial RC promotion on Feb 04, 2021(B35)
  • afd5eef: 8260704: ParallelGC: oldgen expansion needs release-store for _end
  • 081fa3e: 8260927: StringBuilder::insert is incorrect without Compact Strings
  • ed1a775: 8258378: Final nroff manpage update for JDK 16
  • 21f8bf4: 8257215: JFR: Events dropped when streaming over a chunk rotation
  • 0fdf9cd: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled
  • bc41bb1: 8260632: Build failures after JDK-8253353
  • a117e11: 8260339: JVM crashes when executing PhaseIdealLoop::match_fill_loop
  • 8ffdbce: 8260608: add a regression test for 8260370
  • 1926765: 8253353: Crash in C2: guarantee(n != NULL) failed: No Node

Your commit was automatically rebased without conflicts.

Pushed as commit 5307afa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dgbo dgbo deleted the fix_vsla4Simm_typo branch February 4, 2021 01:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot hotspot-compiler hotspot-compiler-dev@openjdk.java.net integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants