Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask #3238

Closed
wants to merge 5 commits into from

Conversation

XiaohongGong
Copy link
Contributor

@XiaohongGong XiaohongGong commented Mar 29, 2021

The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "long/int", while data related APIs use "double/float". This makes some optimizations that based on the type checking cannot work well.

For example, the VectorBox/Unbox elimination like "VectorUnbox (VectorBox v) ==> v" requires the types of output and
input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:

  VectorLoadMask (VectorStoreMask vmask)

Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
vectors with the same element size and vector length, it's safe to do the optimization:

  VectorLoadMask (VectorStoreMask vmask) ==> vmask

The same issue exists for GVN which is based on the type of a node. Making the mask node's hash()/cmp() methods depends on the element size instead of the element type can fix it.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3238/head:pull/3238
$ git checkout pull/3238

Update a local copy of the PR:
$ git checkout pull/3238
$ git pull https://git.openjdk.java.net/jdk pull/3238/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3238

View PR using the GUI difftool:
$ git pr show -t 3238

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3238.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 29, 2021

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 29, 2021
@openjdk
Copy link

openjdk bot commented Mar 29, 2021

@XiaohongGong The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 29, 2021
@mlbridge
Copy link

mlbridge bot commented Mar 29, 2021

Webrevs

@XiaohongGong
Copy link
Contributor Author

Hi, could anyone please help to look at this PR? Thanks so much!

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iwanowww should confirm correctness of such optimization.
Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

@iwanowww
Copy link

iwanowww commented Apr 7, 2021

I'm not fond of the proposed approach.

It hard-codes some implicit assumptions about vector mask representation.
Also, vector type mismatch can trigger asserts since some vector-related code expects that types of vector inputs match precisely).

I suggest to introduce artificial cast nodes (VectorLoadMask (VectorStoreMask vmask) ==> VectorMaskCast (vmask)) which are then lowered into no-ops on backend side.

@XiaohongGong
Copy link
Contributor Author

Hi @iwanowww , thanks for looking at this PR.

It hard-codes some implicit assumptions about vector mask representation.
Also, vector type mismatch can trigger asserts since some vector-related code expects that types of vector inputs match precisely).

Agree. I'm also anxious about the potential assertion although I didn't met the issue currently.

I suggest to introduce artificial cast nodes (VectorLoadMask (VectorStoreMask vmask) ==> VectorMaskCast (vmask)) which are then lowered into no-ops on backend side.

It's a good idea to add a cast node for mask. I tried with it, and it can work well for the casting with same element size and vector length. However, for the different element size (i.g. short->int) casting, I think the original VectorLoadMask (VectorStoreMask vmask) is better since there are some optimizations existed. Also using VectorMaskCast for all cases needs more match rules in the backend which I think is duplicated with VectorLoadMask+VectorStoreMask. So does it look good for you if the VectorMaskCast is limited to the masking cast with the same element size? The changes might look like:

diff --git a/src/hotspot/share/opto/vectornode.cpp b/src/hotspot/share/opto/vectornode.cpp
index 326b9c4a5a0..d7b53c247fb 100644
--- a/src/hotspot/share/opto/vectornode.cpp
+++ b/src/hotspot/share/opto/vectornode.cpp
@@ -1232,6 +1232,10 @@ Node* VectorUnboxNode::Ideal(PhaseGVN* phase, bool can_reshape) {
         bool is_vector_mask    = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorMask_klass());
         bool is_vector_shuffle = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorShuffle_klass());
         if (is_vector_mask) {
+          if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
+              Matcher::match_rule_supported(Op_VectorMaskCast)) {
+            return new VectorMaskCastNode(value, out_vt);
+          }
           // VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)
           value = phase->transform(VectorStoreMaskNode::make(*phase, value, in_vt->element_basic_type(), in_vt->length()));
           return new VectorLoadMaskNode(value, out_vt);

@XiaohongGong
Copy link
Contributor Author

@iwanowww should confirm correctness of such optimization.
Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

Thanks for looking at this PR @vnkozlov . To be honest, I'v no idea about the TOP checking issue to the inputs of the VectorNode. Hope @iwanowww could explain more. Thanks!

@iwanowww
Copy link

iwanowww commented Apr 8, 2021

So does it look good for you if the VectorMaskCast is limited to the masking cast with the same element size?

Yes, I'm fine with focusing on no-op case for now.
Moreover, both vector length and element size should match to reuse the very same bit representation of the mask.

@iwanowww
Copy link

iwanowww commented Apr 8, 2021

I notice that VectorNode and its subclasses do not check for TOP inputs.
Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead.
How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

VectorLoadMaskNode::Identity() can't observe TOP types because it uses the type cached at construction (type() and VectorNode extends TypeNode). Still, a TOP input is possible and should be filtered out by opcode check (in(1)->Opcode() == Op_VectorStoreMask).

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@openjdk
Copy link

openjdk bot commented Apr 8, 2021

@XiaohongGong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

Reviewed-by: kvn, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • 64e2130: 8262108: SimpleDateFormat formatting broken for sq_MK Locale

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @iwanowww) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 8, 2021
@XiaohongGong
Copy link
Contributor Author

Hi @iwanowww , I'v updated the codes to use VectorMaskCast for vector mask casting with the same element size and vector length. Would you mind having a look again? Thanks!

Copy link

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good.

@@ -1232,6 +1232,11 @@ Node* VectorUnboxNode::Ideal(PhaseGVN* phase, bool can_reshape) {
bool is_vector_mask = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorMask_klass());
bool is_vector_shuffle = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorShuffle_klass());
if (is_vector_mask) {
if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
Matcher::match_rule_supported(Op_VectorMaskCast)) {
// VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to implement it as a 2-step transformation and place it in VectorLoadMaskNode::Ideal():

VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask) => VectorMaskCast (vmask)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments. Yes, theoretically it's better to place it in VectorLoadMaskNode::Ideal(). Unfortunately, we met an issue that is related to optimization for VectorStoreMask. Considering the following case:

   LoadVector                                              LoadVector
           |                                                    |
   VectorLoadMask  (double)                          VectorLoadMask (double)
               |                                               |
           VectorUnbox   (long)          ==>           VectorStoreMask  (double)
                                                               |
                                                      VectorLoadMask  (long)

This case loads the masking values for a double type, and does a bitwise and operation. Since the type is mismatched, the compiler will try to do VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask). However, since there is the transformation VectorStoreMask (VectorLoadMask value) ==> value, the above VectorStoreMaskNode will be optimized out. The final graph looks like:

   LoadVector                                                  LoadVector
          |                                                      /      \
        VectorLoadMask  (double)                               /         \
              |                      ==>     VectorLoadMask (double)      \
         VectorStoreMask (double)                                   VectorLoadMask (long)
                   |
            VectorLoadMask (long)

Since the two VectorLoadMaskNode have different element type, the GVN cannot optimize out one. So finally there will be two similar VectorLoadMaskNodes. That's also why I override the cmp/hash for VectorLoadMaskNode in the first version.

So I prefer to add VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask) directly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you face a transformation ordering problem here.

By working on VectorUnbox (VectorBox vmask) you effectively delay VectorStoreMask (VectorLoadMask vmask) => vmask transformation.

As an alternative you could:

(1) check for VectorLoadMask users before applying VectorStoreMask (VectorLoadMask vmask) => vmask;

(2) nest adjacent casts:

VectorLoadMask #double (1 LoadVector)
VectorLoadMask #long   (1 LoadVector)
==>
VectorMaskCast #long (VectorLoadMask #double (1 LoadVector)

The latter looks more powerful (and hence preferrable), but I'm fine with what you have right now. (It can be enhanced later.)
Please, leave a comment describing the motivation for doing the transformation directly on VectorUnbox (VectorBox ...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your alternative advice! I prefer to keep the code as it is right now. Also the comments have been added. Thanks!

src/hotspot/share/opto/vectornode.cpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/vectornode.hpp Show resolved Hide resolved
@XiaohongGong
Copy link
Contributor Author

Hi @iwanowww , all your review comments have been addressed. Would you mind having a look at it again? Thanks!

@@ -1237,6 +1237,17 @@ class VectorStoreMaskNode : public VectorNode {
static VectorStoreMaskNode* make(PhaseGVN& gvn, Node* in, BasicType in_type, uint num_elem);
};

class VectorMaskCastNode : public VectorNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorMaskReinterpret seems better choice, since its a re-interpretation and not a casting (up/down).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering masks have platform-specific representation, full-blown casts between different element types look more appropriate here.

In this particular case, the focus is on the cheapest possible case when representations share the same bit pattern and the cast degenerates into a no-op. But in the longer term, it makes perfect sense to support the full matrix of conversions and don't rely on VectorLoadMask <=> VectorStoreMask and intermediate canonical vector representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@XiaohongGong
Copy link
Contributor Author

Hi @jatin-bhateja @iwanowww , all your comments have been addressed. Could you please take a look at it again? Thanks!

@jatin-bhateja
Copy link
Member

Hi @jatin-bhateja @iwanowww , all your comments have been addressed. Could you please take a look at it again? Thanks!

Thanks for addressing it. as Vladimir suggested in long term we can just emit this node instead of VectorStoreMask + VectorLoadMask combination when mask types are non-conformal to emit efficient instruction sequence using input mask packing/unpacking.

Copy link

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@XiaohongGong
Copy link
Contributor Author

Thanks for addressing it. as Vladimir suggested in long term we can just emit this node instead of VectorStoreMask + VectorLoadMask combination when mask types are non-conformal to emit efficient instruction sequence using input mask packing/unpacking.

Yeah, previously I thought about this. However, considering there are some optimizations like GVN for VectorStoreMask, currently I prefer to keep the code as it is. Maybe we can reconsider this in future. Thanks!

@XiaohongGong
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 16, 2021
@openjdk
Copy link

openjdk bot commented Apr 16, 2021

@XiaohongGong
Your change (at version 8232bd9) is now ready to be sponsored by a Committer.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Apr 16, 2021
@XiaohongGong
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 16, 2021
@openjdk
Copy link

openjdk bot commented Apr 16, 2021

@XiaohongGong
Your change (at version e4352b3) is now ready to be sponsored by a Committer.

@nsjian
Copy link

nsjian commented Apr 16, 2021

/sponsor

@openjdk openjdk bot closed this Apr 16, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 16, 2021
@openjdk
Copy link

openjdk bot commented Apr 16, 2021

@nsjian @XiaohongGong Since your change was applied there has been 1 commit pushed to the master branch:

  • 64e2130: 8262108: SimpleDateFormat formatting broken for sq_MK Locale

Your commit was automatically rebased without conflicts.

Pushed as commit e0151a6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
5 participants