Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes #1532

Closed
wants to merge 2 commits into from

Conversation

dgbo
Copy link
Member

@dgbo dgbo commented Dec 1, 2020

Currently, for all CPUs, if optimizing RotateLeftV and RotateRightV with match rules in AD files, we have to implement both immediate and variable versions.

On aarch64, with match rules for vector rotation, immediate vector rotatation can be optimized with shift+insert instructions (i.e. SLI/SRI, ~20% improvements with an initial implementation).
However there woule be performance regression for variable version, due to SLI/SRI have no register version in NEON intruction set and there is no register version for right shift neither.
The instructions for match rules of vector rotate variable should be:

    mov w9, 32
    dup v13.4s, w9
    sub  v20.4s, v13.4S, v19.4s
    # For a loop, default version only has code below,
    # the code above (loop invariants) are put outside of a loop.
    sshl v17.4s, v16.4s, v19.4s
    neg v18.16b, v20.16b       # on aarch64, vector right shift is implemented as left shift by negative shift count
    ushl v16.4s, v16.4s, v18.4s
    orr v16.16b, v17.16b, v16.16b

With this patch, immediate vector rotation can be matched alone and optimized on CPUs like aarch64.

Verified with linux-x86_64-server-fastdebug tier1-3 and passed.
Also added immediate vector rotation tests to micro test/micro/org/openjdk/bench/java/lang/RotateBenchmark.java.
Tested the micro on a x86_64/aarch64 server and witnessed no regressions.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1532/head:pull/1532
$ git checkout pull/1532

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 1, 2020

👋 Welcome back dongbo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 1, 2020
@openjdk
Copy link

openjdk bot commented Dec 1, 2020

@dgbo The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 1, 2020
@dgbo
Copy link
Member Author

dgbo commented Dec 1, 2020

/label remove core-libs

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Dec 1, 2020
@openjdk
Copy link

openjdk bot commented Dec 1, 2020

@dgbo
The core-libs label was successfully removed.

@mlbridge
Copy link

mlbridge bot commented Dec 1, 2020

Webrevs

Copy link

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you introduce new ideal nodes (RotateLeftImmV/RotateRightImmV)? You already extended is_vector_rotate_supported to take the shift argument into account.

Also, you mention aarch64, but I see only x86 changes.

@dgbo
Copy link
Member Author

dgbo commented Dec 2, 2020

Why do you introduce new ideal nodes (RotateLeftImmV/RotateRightImmV)? You already extended is_vector_rotate_supported to take the shift argument into account.

I think extended is_vector_rotate_supported can't offer what we are going after.

Aarch64 supports OrV/LShiftV/URShiftV, so is_vector_rotate_supported returns true anyway with or without Rotate*ImmV.
If we do not introduce RotateLeftImmV/RotateRightImmV nodes, RotateLeftV/RotateRightVnodes will be created for both immediate and variable rotation.
As mentioned before, variable rotation matching rules should no exsit in aarch64.ad, due to the performance regression.
It leads to a crash at src/hotspot/share/opto/matcher.cpp:1657 with message assert(false) failed: bad AD file.

If extended is_vector_rotate_supported return false anyhow for variable rotations like:

 bool VectorNode::is_vector_rotate_supported(int vopc, uint vlen, BasicType bt, bool is_imm) {
   assert(vopc == Op_RotateLeftV || vopc == Op_RotateRightV, "wrong opcode");

+  AARCH64_ONLY(if(!is_imm) return false);
+
   // Get the opcode for rotate immediate
   if (is_imm) {
     vopc = vopc == Op_RotateLeftV ? Op_RotateLeftImmV : Op_RotateRightImmV;

The variable roations would not be vectorized by OrV/LShiftV/URShiftV combinations.

With RotateLeftImmV/RotateRightImmV nodes, we can match them separately and optimized in a CPU and keep the variable rotation the same as before.
I'am sorry if I miss something.

Also, you mention aarch64, but I see only x86 changes.

I tried to optimize immediate vector rotations with shift&insert in JDK-8256820 for aarch64.
WIth RotateLeftImmV/RotateRightImmV, an initial optimization for aarch64 is shown in attachment optimize_rotateImm_aarch64.txt
, tests are good.
But I also want to discuss some other modifications (for aarch64 only) in that issue, e.g. trivial fix for absolute and accumulate with small vector length, move shifting operations to aarch64_neon.ad, etc.
To make this progress clean and clear, and I think this splitting immediate roatation can be discussed separately, so I raised up this new PR. Hope this would not be a bother.

@iwanowww
Copy link

iwanowww commented Dec 2, 2020

Thanks for the detailed answer, but I'm still confused why RotateLeftImmV is more powerful than RotateLeftV + a check that shift is constant. Why can't RotateLeftV/RotateRightV vector nodes be allowed on some platforms only with constant shifts?

@iwanowww
Copy link

iwanowww commented Dec 2, 2020

It looks like you implicitly depend on the check that there are rules present which match that particular opcode:

const bool Matcher::match_rule_supported(int opcode) {
  if (!has_match_rule(opcode)) {
    return false; // no match rule present
  }

But you don't have to and I don't see why additional checks in VectorNode::is_vector_rotate_supported() can't be extended to cover that for you.

@dgbo
Copy link
Member Author

dgbo commented Dec 3, 2020

As far as I considered, RotateLeftV + a check in VectorNode::is_vector_rotate_supported() will degradate performance of rotating with variable shift.

Taking the following test for example.

  // Rotate with variable SHIFT, TESTSIZE = 1024
  public void testRotateRightI() {
    for (int i = 0; i < TESTSIZE; i++)
       ires[i] = Integer.rotateRight(iarr[i], SHIFT);
  }

On aarch64, we have scalar and vector instructions to implement the loop core of this test:

A. vector with NEON
sshl v17.4s, v16.4s, v19.4s              B. scalar
neg v18.16b, v20.16b                    ror    w13, w13, w10
ushl v16.4s, v16.4s, v18.4s
orr v16.16b, v17.16b, v16.16b

As of now, the default code generated by C2 is A under SuperWord framework.

Without RotateLeftImmV/RotateRightImmV, when we match RotateLeftV/RotateRightV only with constant shifts,
to avoid the BAD AD file crash, we have to add a check to return false in VectorNode::is_vector_rotate_supported(), like:

bool VectorNode::is_vector_rotate_supported(int vopc, uint vlen, BasicType bt, bool is_imm) {
  assert(vopc == Op_RotateLeftV || vopc == Op_RotateRightV, "wrong opcode");

  /* --- Matcher::match_rule_supported() independent checks  added --- */
  AARCH64_ONLY(if (!is_imm) return false); 
  AARCH64_ONLY(if (is_imm) return true); 
  ...
  // Validate existence of nodes created in case of rotate degeneration.
  switch (bt) {
    case T_INT:
      return Matcher::match_rule_supported_vector(Op_OrV,       vlen, bt) &&
             Matcher::match_rule_supported_vector(Op_LShiftVI,  vlen, bt) &&
             Matcher::match_rule_supported_vector(Op_URShiftVI, vlen, bt);
    case T_LONG:
      return Matcher::match_rule_supported_vector(Op_OrV,       vlen, bt) &&
             Matcher::match_rule_supported_vector(Op_LShiftVL,  vlen, bt) &&
             Matcher::match_rule_supported_vector(Op_URShiftVL, vlen, bt);
    default:
      assert(false, "not supported: %s", type2name(bt));
      return false;
  }
}

But the check also tells C2 that RotateVRight with vairable shifts can't be vectorized.
The code is rolled back to scalar version. I check the asm code generated with modifications above (see [1]), it is B.
So we have a performance regression here, because RotateVRight can be vectorized via OrV+LShiftVI+URShiftVI (A).

[1] An implementation with RotateLeftV + a check in VectorNode::is_vector_rotate_supported() . Link: dgbo@2b3b01d

@iwanowww
Copy link

iwanowww commented Dec 3, 2020

Thanks for the clarifications. I think I understand now the issue you are facing.

As of now, it works as follows:

RotateLeft ==vectorizes==> RotateLeftV ==degenerates(?)==> OrV+LShiftV+URShiftV

VectorNode::is_vector_rotate_supported() checks that either RotateLeftV or OrV/LShiftV/URShiftV are supported.
Then RotateLeftV::Ideal checks for RotateLeftV support and replaces the node with OrV/LShiftV/URShiftV if it is absent.

Your fix proposes to introduce RotateLeftImmV and keep the checks in VectorNode::is_vector_rotate_supported() intact:

RotateLeft ==vectorizes==> RotateLeftV    ==degenerates(?)==> OrV+LShiftV+URShiftV
           ==vectorizes==> RotateLeftImmV ==degenerates(?)==> OrV+LShiftV+URShiftV

What I'm after is to keep existing nodes and adjust the checks in RotateLeftV::Ideal to take shift value into account and degrade to OrV+LShiftV+URShiftV depending on whether the platform finds constant/variable shifts profitable.

@dgbo
Copy link
Member Author

dgbo commented Dec 3, 2020

Yes, it works just the same as you mentioned. Thanks a lot to make this so clear.
The degeneration to OrV+LShiftV+URShiftV is at RotateLeftVNode::Ideal(), then VectorNode::degenerate_vector_rotate():
RotateLeft ==vectorizes==> RotateLeftV ==degenerates(RotateLeftVNode::Ideal, VectorNode::degenerate_vector_rotate())==> OrV+LShiftV+URShiftV

However, the workflow relys on the creation of RotateLeftV node, and the creation of RotateLeftV with variables should be forbidden if one CPU match RotateLeftV/RotateRightV only with constant shifts.
Otherwise, Matcher::Label_Root() will check wether the AD files have implemented the matching rules for RotateLeftV with variables.
If there are no rules for RotateLeftV with variables, while we have rules for RotateLeftV with constants, we get a crash at

assert( false, "bad AD file" );
.

IMHO, seems ajusting checks in RotateLeftV::Ideal does not deliver the solution, since there are no RotateRightV for variable shifts from the start.

@iwanowww
Copy link

iwanowww commented Dec 3, 2020

However, the workflow relys on the creation of RotateLeftV node, and the creation of RotateLeftV with variables should be forbidden if one CPU match RotateLeftV/RotateRightV only with constant shifts.

But degeneration step is specifically designed to cover the case when rotate vector node is not supported by the matcher.
It is perfectly fine to instantiate RotateV node considering it will be replaced by OrV+LShiftV+URShiftV before matcher kicks in. You just need to adjust the check so it works properly, don't you?

@openjdk
Copy link

openjdk bot commented Dec 7, 2020

@dgbo this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout split_vector_rotate
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Dec 7, 2020
@openjdk openjdk bot added rfr Pull request is ready for review and removed merge-conflict Pull request has merge conflict with target branch labels Dec 7, 2020
@dgbo
Copy link
Member Author

dgbo commented Dec 7, 2020

Sorry, my mistake, ajusting the checks would be enough.

Seems we have to pass the in(2)->is_Con() to code implemented by a CPU, so that the CPU can return wether the CPU supports RotateRightV and RotateLeftV variables or not.
I extended the exsiting bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt) to bool match_rule_supported_vector(int opcode, int vlen, BasicType bt, int ext = MATCH_RULE_SUPPORT_VECTOR_HAS_NO_EXTINFO).
This function works the same as before if the new added ext equals to default MATCH_RULE_SUPPORT_VECTOR_HAS_NO_EXTINFO.
In this case, the new added parameter ext is supposed to indicate the shift is a constant or a variable for RotateRightV and RotateLeftV.
We can handle it at the start of when neccessary, for optimizing RotateRightV and RotateLeftV on aarch64, like:

-const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt) {
+const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt, int ext) {
+  if (Matcher::match_rule_supported_vector_has_extinfo(ext)) {
+    switch(opcode) {
+      case Op_RotateRightV:
+      case Op_RotateLeftV:
+        if (VectorNode::is_rotate_vector_var(ext)) {
+          return false;
+       }
+        break;
+      default:
+        break;
+    }
+  }

With this, I think it is also easy to extended to cover other possible special CPU dependent checks for match rules in the future, like partially supports for other opcode.

@dgbo
Copy link
Member Author

dgbo commented Dec 7, 2020

Hi, @iwanowww

I have updated a version, in which I ajusted the checks in function Ideal.
But I haven't recieved related mails of this version from mailist hotspot-compiler-dev by far.
In case you haven't netither, the modifications are available at https://github.com/openjdk/jdk/pull/1532/files.

Thanks.

@@ -1172,7 +1182,8 @@ Node* VectorNode::degenerate_vector_rotate(Node* src, Node* cnt, bool is_rotate_
Node* RotateLeftVNode::Ideal(PhaseGVN* phase, bool can_reshape) {
int vlen = length();
BasicType bt = vect_type()->element_basic_type();
if (!Matcher::match_rule_supported_vector(Op_RotateLeftV, vlen, bt)) {
int extinfo = encode_rotate_vector_shift_type(in(2));
if (!Matcher::match_rule_supported_vector(Op_RotateLeftV, vlen, bt, extinfo)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in encoding and packing node-specific info in an int. Why not simply introduce a new platform-specific entry and pass a Node* instead letting platform-specific code to extract any useful information from it?

As an alternative, introduce a new capability (Matcher::supports_vector_variable_rotates) and check it in shared code (RotateLeftVNode::Ideal).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced Matcher::supports_vector_variable_rotates(VectorNode *n) in the updated commit d2cb793.
Verified it both on AARCH64 and X86_64 platforms with fastdebug build. Tests are good.

@@ -1814,6 +1814,11 @@ bool Matcher::supports_vector_variable_shifts(void) {
return (UseAVX >= 2);
}

bool Matcher::supports_vector_variable_rotates(VectorNode *n) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, turn it into a capability check (just bool Matcher::supports_vector_variable_rotates(void), no argument).
There's no need to wrap Matcher::match_rule_supported_vector since you already check it:

(in(2)->is_Con() && !Matcher::supports_vector_variable_rotates(this)) ||
!Matcher::match_rule_supported_vector(Op_RotateLeftV, vlen, bt)) {

Otherwise, looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Turned the capability check into bool Matcher::supports_vector_variable_rotates(void). On x86_64, it returns true directly now.

Copy link

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Dec 10, 2020

@dgbo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes

Reviewed-by: vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 90 new commits pushed to the master branch:

  • 0a0691e: 8257901: ZGC: Take virtual memory usage into account when sizing heap
  • 29ffffa: 8257997: sun/security/ssl/SSLSocketImpl/SSLSocketLeak.java again reports leaks after JDK-8257884
  • db5da96: 8257876: Avoid Reference.isEnqueued in tests
  • 4a839e9: 8256459: java/net/httpclient/ManyRequests.java and java/net/httpclient/LineBodyHandlerTest.java fail infrequently with java.net.ConnectException: Connection timed out: no further information
  • d93293f: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well
  • 869dcb6: 8257806: Optimize x86 allTrue and anyTrue vector mask operations of Vector API
  • 34650f5: 8257872: UL: -Xlog does not check number of options
  • 6847bbb: 8255918: XMLStreamFilterImpl constructor consumes XMLStreamException
  • d2f9e31: 8257638: Update usage of "type" terminology in javax.lang.model
  • f631a99: 8256888: Client manual test problem list update
  • ... and 80 more: https://git.openjdk.java.net/jdk/compare/05dac03f36a593c3e838552639ebf6820fd6c504...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@iwanowww) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 10, 2020
@dgbo
Copy link
Member Author

dgbo commented Dec 10, 2020

@iwanowww Thanks a lot for the review.
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Dec 10, 2020
@openjdk
Copy link

openjdk bot commented Dec 10, 2020

@dgbo
Your change (at version faea260) is now ready to be sponsored by a Committer.

@RealFYang
Copy link
Member

/sponsor

@openjdk openjdk bot closed this Dec 10, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 10, 2020
@openjdk
Copy link

openjdk bot commented Dec 10, 2020

@RealFYang @dgbo Since your change was applied there have been 90 commits pushed to the master branch:

  • 0a0691e: 8257901: ZGC: Take virtual memory usage into account when sizing heap
  • 29ffffa: 8257997: sun/security/ssl/SSLSocketImpl/SSLSocketLeak.java again reports leaks after JDK-8257884
  • db5da96: 8257876: Avoid Reference.isEnqueued in tests
  • 4a839e9: 8256459: java/net/httpclient/ManyRequests.java and java/net/httpclient/LineBodyHandlerTest.java fail infrequently with java.net.ConnectException: Connection timed out: no further information
  • d93293f: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well
  • 869dcb6: 8257806: Optimize x86 allTrue and anyTrue vector mask operations of Vector API
  • 34650f5: 8257872: UL: -Xlog does not check number of options
  • 6847bbb: 8255918: XMLStreamFilterImpl constructor consumes XMLStreamException
  • d2f9e31: 8257638: Update usage of "type" terminology in javax.lang.model
  • f631a99: 8256888: Client manual test problem list update
  • ... and 80 more: https://git.openjdk.java.net/jdk/compare/05dac03f36a593c3e838552639ebf6820fd6c504...master

Your commit was automatically rebased without conflicts.

Pushed as commit 026b09c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dgbo dgbo deleted the split_vector_rotate branch December 10, 2020 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
3 participants