New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask #3238
Conversation
👋 Welcome back xgong! A progress list of the required criteria for merging this PR into |
@XiaohongGong The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
af74280
to
466cf11
Compare
…box for floating point VectorMask
466cf11
to
bf5b202
Compare
Hi, could anyone please help to look at this PR? Thanks so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iwanowww should confirm correctness of such optimization.
Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.
I'm not fond of the proposed approach. It hard-codes some implicit assumptions about vector mask representation. I suggest to introduce artificial cast nodes ( |
Hi @iwanowww , thanks for looking at this PR.
Agree. I'm also anxious about the potential assertion although I didn't met the issue currently.
It's a good idea to add a cast node for mask. I tried with it, and it can work well for the casting with same element size and vector length. However, for the different element size (i.g. short->int) casting, I think the original
|
Thanks for looking at this PR @vnkozlov . To be honest, I'v no idea about the TOP checking issue to the inputs of the VectorNode. Hope @iwanowww could explain more. Thanks! |
Yes, I'm fine with focusing on no-op case for now. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
@XiaohongGong This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @iwanowww) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
…e element size mask casting
Hi @iwanowww , I'v updated the codes to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good.
@@ -1232,6 +1232,11 @@ Node* VectorUnboxNode::Ideal(PhaseGVN* phase, bool can_reshape) { | |||
bool is_vector_mask = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorMask_klass()); | |||
bool is_vector_shuffle = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorShuffle_klass()); | |||
if (is_vector_mask) { | |||
if (in_vt->length_in_bytes() == out_vt->length_in_bytes() && | |||
Matcher::match_rule_supported(Op_VectorMaskCast)) { | |||
// VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to implement it as a 2-step transformation and place it in VectorLoadMaskNode::Ideal()
:
VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask) => VectorMaskCast (vmask)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments. Yes, theoretically it's better to place it in VectorLoadMaskNode::Ideal()
. Unfortunately, we met an issue that is related to optimization for VectorStoreMask
. Considering the following case:
LoadVector LoadVector
| |
VectorLoadMask (double) VectorLoadMask (double)
| |
VectorUnbox (long) ==> VectorStoreMask (double)
|
VectorLoadMask (long)
This case loads the masking values for a double type, and does a bitwise and
operation. Since the type is mismatched, the compiler will try to do VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)
. However, since there is the transformation VectorStoreMask (VectorLoadMask value) ==> value
, the above VectorStoreMaskNode
will be optimized out. The final graph looks like:
LoadVector LoadVector
| / \
VectorLoadMask (double) / \
| ==> VectorLoadMask (double) \
VectorStoreMask (double) VectorLoadMask (long)
|
VectorLoadMask (long)
Since the two VectorLoadMaskNode
have different element type, the GVN cannot optimize out one. So finally there will be two similar VectorLoadMaskNode
s. That's also why I override the cmp/hash
for VectorLoadMaskNode
in the first version.
So I prefer to add VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so you face a transformation ordering problem here.
By working on VectorUnbox (VectorBox vmask)
you effectively delay VectorStoreMask (VectorLoadMask vmask) => vmask
transformation.
As an alternative you could:
(1) check for VectorLoadMask
users before applying VectorStoreMask (VectorLoadMask vmask) => vmask
;
(2) nest adjacent casts:
VectorLoadMask #double (1 LoadVector)
VectorLoadMask #long (1 LoadVector)
==>
VectorMaskCast #long (VectorLoadMask #double (1 LoadVector)
The latter looks more powerful (and hence preferrable), but I'm fine with what you have right now. (It can be enhanced later.)
Please, leave a comment describing the motivation for doing the transformation directly on VectorUnbox (VectorBox ...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your alternative advice! I prefer to keep the code as it is right now. Also the comments have been added. Thanks!
Hi @iwanowww , all your review comments have been addressed. Would you mind having a look at it again? Thanks! |
@@ -1237,6 +1237,17 @@ class VectorStoreMaskNode : public VectorNode { | |||
static VectorStoreMaskNode* make(PhaseGVN& gvn, Node* in, BasicType in_type, uint num_elem); | |||
}; | |||
|
|||
class VectorMaskCastNode : public VectorNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorMaskReinterpret seems better choice, since its a re-interpretation and not a casting (up/down).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering masks have platform-specific representation, full-blown casts between different element types look more appropriate here.
In this particular case, the focus is on the cheapest possible case when representations share the same bit pattern and the cast degenerates into a no-op. But in the longer term, it makes perfect sense to support the full matrix of conversions and don't rely on VectorLoadMask <=> VectorStoreMask
and intermediate canonical vector representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Hi @jatin-bhateja @iwanowww , all your comments have been addressed. Could you please take a look at it again? Thanks! |
Thanks for addressing it. as Vladimir suggested in long term we can just emit this node instead of VectorStoreMask + VectorLoadMask combination when mask types are non-conformal to emit efficient instruction sequence using input mask packing/unpacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Yeah, previously I thought about this. However, considering there are some optimizations like GVN for |
/integrate |
@XiaohongGong |
/integrate |
@XiaohongGong |
/sponsor |
@nsjian @XiaohongGong Since your change was applied there has been 1 commit pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit e0151a6. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "
long/int
", while data related APIs use "double/float
". This makes some optimizations that based on the type checking cannot work well.For example, the VectorBox/Unbox elimination like
"VectorUnbox (VectorBox v) ==> v"
requires the types of output andinput are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
vectors with the same element size and vector length, it's safe to do the optimization:
The same issue exists for GVN which is based on the type of a node. Making the mask node's
hash()/cmp()
methods depends on the element size instead of the element type can fix it.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3238/head:pull/3238
$ git checkout pull/3238
Update a local copy of the PR:
$ git checkout pull/3238
$ git pull https://git.openjdk.java.net/jdk pull/3238/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3238
View PR using the GUI difftool:
$ git pr show -t 3238
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3238.diff