Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266287: Basic mask IR implementation for the Vector API masking feature support #78

Closed

Conversation

@XiaohongGong
Copy link
Collaborator

@XiaohongGong XiaohongGong commented May 6, 2021

Based on [1], this patch adds the C2 compiler mid-end changes for the masking feature support. It mainly contains:

  1. Generation of the mask IRs for vector mask, including:
    • Mask generations, e.g. load/compare/maskAll
    • Mask operations, e.g. and/or/xor
  2. Conversions between vector and mask after loading mask values from memory and before storing mask values into memory
  3. Generation of the vector IRs which need the mask value as the control
    • The mask value is appended to the original vector node's input list

With this change, the bottom type of the vector mask will be set to "TypeVectMask" if the platform supports the masking feature and the backend implementations are added.

Note that this patch only contains the compiler mid-end changes. The backend implementations for SVE/AVX-512 will be in the
followed-up patches.

[1] #57


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8266287: Basic mask IR implementation for the Vector API masking feature support

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/78/head:pull/78
$ git checkout pull/78

Update a local copy of the PR:
$ git checkout pull/78
$ git pull https://git.openjdk.java.net/panama-vector pull/78/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 78

View PR using the GUI difftool:
$ git pr show -t 78

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/78.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 6, 2021

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into vectorIntrinsics+mask will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label May 6, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 6, 2021

Webrevs

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented May 8, 2021

Hi @sviswa7 @PaulSandoz @iwanowww @jatin-bhateja, this is the C2 compiler change side for the masking feature support. Could you please take a look at this PR? Thanks so much!

macro(MaskToVector)
macro(AndVMask)
macro(OrVMask)
macro(XorVMask)
Copy link
Member

@jatin-bhateja jatin-bhateja May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have used existing IR nodes for other commutative vector operations like Add/Sub/Mul, why specialized masked IR for OrV/AndV/XorV

Copy link
Collaborator Author

@XiaohongGong XiaohongGong May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three nodes are used for the vector mask logical operations while not for the masked vector logics. E.g mask.and()/mask.or()/mask.xor().

int cnt = sizeof(commut_op_list)/sizeof(char*);

if( _lChild && _rChild && (_lChild->_lChild || _rChild->_lChild) ) {
static const char *commut_vector_op_list[] = {
"AddVB", "AddVS", "AddVI", "AddVL", "AddVF", "AddVD",
"MulVB", "MulVS", "MulVI", "MulVL", "MulVF", "MulVD",
"AndV", "OrV", "XorV",
"MaxV", "MinV"
};

if (_lChild && _rChild && (_lChild->_lChild || _rChild->_lChild)) {
// Don't swap if right operand is an immediate constant.
bool is_const = false;
if( _rChild->_lChild == NULL && _rChild->_rChild == NULL ) {
if (_rChild->_lChild == NULL && _rChild->_rChild == NULL) {
FormDict &globals = _AD.globalNames();
const Form *form = globals[_rChild->_opType];
if ( form ) {
OperandForm *oper = form->is_operand();
if( oper && oper->interface_type(globals) == Form::constant_interface )
if (form) {
OperandForm *oper = form->is_operand();
if (oper && oper->interface_type(globals) == Form::constant_interface)
is_const = true;
}
}
if( !is_const ) {
for( int i=0; i<cnt; i++ ) {
if( strcmp(_opType, commut_op_list[i]) == 0 ) {
count++;
_commutative_id = count; // id should be > 0

if (!is_const) {
int scalar_cnt = sizeof(commut_op_list)/sizeof(char*);
int vector_cnt = sizeof(commut_vector_op_list)/sizeof(char*);
bool matched = false;

// Check the commutative vector op first. It's noncommutative if
// the current node is a masked vector op, since a mask value
// is added to the original vector node's input list and the original
// first two inputs are packed into one BinaryNode. So don't swap
// if one of the operands is a BinaryNode.
for (int i = 0; i < vector_cnt; i++) {
if (strcmp(_opType, commut_vector_op_list[i]) == 0) {
if (strcmp(_lChild->_opType, "Binary") != 0 &&
strcmp(_rChild->_opType, "Binary") != 0) {
count++;
_commutative_id = count; // id should be > 0
}
matched = true;
break;
}
}

// Then check the scalar op if the current op is not in
// the commut_vector_op_list.
if (!matched) {
for (int i = 0; i < scalar_cnt; i++) {
if (strcmp(_opType, commut_op_list[i]) == 0) {
count++;
_commutative_id = count; // id should be > 0
break;
}
}
}
}
}
if( _lChild )
if (_lChild)
_lChild->count_commutative_op(count);
if( _rChild )
if (_rChild)
_rChild->count_commutative_op(count);
}
Copy link
Member

@jatin-bhateja jatin-bhateja May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matcher rules based on proposed masked IR shall look like following
match(Set dst AddVI (Binary vsrc1 vsrc2) mask)

AddVI is still commutative and generated DFA will check for appropriate child states i.e.
kid[0]->state == _Binary_vec_vec && kid[1]->state == _mask

So even if matcher generates additional check by swapping the states of the two child nodes it should still be ok. Can you kindly elaborate the need for this change.

Copy link
Collaborator Author

@XiaohongGong XiaohongGong May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually that's ok if matcher swaps the two operands. Avoiding the swapping could avoid generating unneeded rules which could reduce the whole code size.

src/hotspot/share/opto/matcher.cpp Show resolved Hide resolved
@@ -72,7 +80,8 @@ Node* GraphKit::box_vector(Node* vector, const TypeInstPtr* vbox_type, BasicType
Node* ret = gvn().transform(new ProjNode(alloc, TypeFunc::Parms));

assert(check_vbox(vbox_type), "");
const TypeVect* vt = TypeVect::make(elem_bt, num_elem);
const TypeVect* vt = is_vector_mask(vbox_type->klass()) ?
TypeVect::makemask(elem_bt, num_elem) : TypeVect::make(elem_bt, num_elem);
Copy link
Member

@jatin-bhateja jatin-bhateja May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box type which is an envelope type should still be the same e.g. Int256VectorMask. Value contained in the Masked Box should be of type vectmask.

Copy link
Collaborator Author

@XiaohongGong XiaohongGong May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't remember what exactly the issue I met for this part. But one reason of this change is to make it easy for the VectorUnboxNode::Identity, which could directly return the boxed value if the unbox type and the boxed value type is exactly matched:

Node* VectorUnboxNode::Identity(PhaseGVN* phase) {
  Node* n = obj()->uncast();
  if (EnableVectorReboxing && n->Opcode() == Op_VectorBox) {
    if (Type::cmp(bottom_type(), n->in(VectorBoxNode::Value)->bottom_type()) == 0) {
      return n->in(VectorBoxNode::Value); // VectorUnbox (VectorBox v) ==> v
    } else {
      // Handled by VectorUnboxNode::Ideal().
    }
  }
  return this;
}

Using TypeVect for VectorBox/Unbox while TypeVectMask for the boxed value will make the type mismatched. But I think it's not the main problem, we could add some logics to make them matched. @iwanowww do you have any ideas about this part? Thanks!

Copy link
Collaborator Author

@XiaohongGong XiaohongGong May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja , did you see any issue if the type of VectorBox/VectorUnbox is not the enveloped vector type?

Copy link
Collaborator Author

@XiaohongGong XiaohongGong May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out another reason why I need to set the TypeVectMask for VectorBox/Unbox. This is mainly used to make the assertions for mask inputs work well when creating a masked node, like the assertion in:

StoreVectorMaskedNode(Node* c, Node* mem, Node* dst, Node* src, const TypePtr* at, Node* mask)
   : StoreVectorNode(c, mem, dst, at, src) {
    assert(mask->bottom_type()->isa_vectmask(), "sanity");
    init_class_id(Class_StoreVector);
    set_mismatched_access();
    add_req(mask);
  }

If the StoreVectorMaskedNode is created before expanding the VectorUnbox nodes, the mask value might be a VectorUnbox which has a TypeVect type while not the expected mask type. And then the assertion will fail. We can also remove all the assertions in the constructor methods to make a workaround. But I think it's not a safe way. Any better idea to handle the conflicts? Thanks!

Copy link
Collaborator Author

@XiaohongGong XiaohongGong Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jatin-bhateja , it seems the type of VectorBox/VectorUnbox should be exactly matched with the value boxed. For such an optimization // VectorUnbox (VectorBox v) ==> v, if the type of v is a vector mask type while the VectorUnbox type is expected to be a vector type, the final result of this optimization would change the result type of VectorUnbox to be a mask type. Some type checks or assertion will fail. I met the following issue when I run the jtreg tests after changing type of VectorBox/VectorUnbox back to vector type:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/ent-user/panama-vector/src/hotspot/share/opto/type.cpp:2412), pid=433543, tid=433698
#  Error: assert(base() == v->base()) failed
#
# JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-git-8123588d66)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 17-internal+0-git-8123588d66, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x19f3108]  TypeVect::xmeet(Type const*) const+0x138
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /home/ent-user/jtreg/jtreg-hg/build/images/jtreg/jtwork/scratch/2/core.433543)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp

Do you have any strong reason that we must define a vector type for VectorBox/VectorUnbox?

Copy link
Collaborator Author

@XiaohongGong XiaohongGong Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the crash log, I guess the issue happens for a PhiNode with vector masks as its values. Assume one of its value is a VectorUnboxNode with TypeVect while others are mask nodes with TypeVectMask, then the crash can happen when the compiler try to decide the final type of the PhiNode. Please correct me if I misunderstand it. Thanks!

Copy link
Member

@jatin-bhateja jatin-bhateja Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @XiaohongGong,

There could be a problem during scalarize_vbox_node if vector type of mask VectorBoxNode is made as TypeVectMask
https://github.com/jatin-bhateja/panama-vector/blob/628dc716b3f171e1b7b62fd1ec452a3d14a04af9/src/hotspot/share/opto/vector.cpp#L212

There is an ideal transformation which re-associates the Phi node having all its inputs as VectorBoxes of same type.
https://github.com/openjdk/panama-vector/blob/vectorIntrinsics%2Bmask/src/hotspot/share/opto/cfgnode.cpp#L2413

This can lead to a situation where PhiNode is created with TypeVect and inputs are of TypeVectMask, as a workaround I added another transformation which adjusts to bottom type of PhiNode to match with the bottom type of its inputs (TypeVectMask). This change is in my local repo having other masking changes.
https://github.com/jatin-bhateja/panama-vector/blob/JDK-8262356_vectorIntrinsics%2Bmask/src/hotspot/share/opto/cfgnode.cpp#L2433

Copy link
Collaborator Author

@XiaohongGong XiaohongGong Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a problem during scalarize_vbox_node if vector type of mask VectorBoxNode is made as TypeVectMask
https://github.com/jatin-bhateja/panama-vector/blob/628dc716b3f171e1b7b62fd1ec452a3d14a04af9/src/hotspot/share/opto/vector.cpp#L212

I think the main issues that I met are related to VectorUnboxNode while not VectorBoxNode. BTW, the bottom type of VectorBoxNode is the box type while not the vector type. I'm sorry that I cannot see the influence whether the vector type is a TypeVect or TypeVectMask for it. Could you please kindly elaborate more about this?

From my side, setting the TypeVect to VectorBoxNode might be ok, but how about the VectorUnboxNode ? I think they should keep the same with each other?

Copy link
Member

@jatin-bhateja jatin-bhateja Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on this. It seems that as long as both vector_type of VectorBoxNode and type of VectorUnboxNode comply it should not lead to any issue.
May be one can directly pass the type of the vector being boxed instead of creating a new type during box creation.

diff --git a/src/hotspot/share/opto/vectorIntrinsics.cpp b/src/hotspot/share/opto/vectorIntrinsics.cpp
index 940e87b97ad..d1d0e6f5f24 100644
--- a/src/hotspot/share/opto/vectorIntrinsics.cpp
+++ b/src/hotspot/share/opto/vectorIntrinsics.cpp
@@ -73,8 +73,7 @@ Node* GraphKit::box_vector(Node* vector, const TypeInstPtr* vbox_type, BasicType
   Node* ret = gvn().transform(new ProjNode(alloc, TypeFunc::Parms));

   assert(check_vbox(vbox_type), "");
-  const TypeVect* vt = TypeVect::make(elem_bt, num_elem);
-  VectorBoxNode* vbox = new VectorBoxNode(C, ret, vector, vbox_type, vt);
+  VectorBoxNode* vbox = new VectorBoxNode(C, ret, vector, vbox_type, vector->bottom_type()->is_vect());
   return gvn().transform(vbox);
 }

Copy link
Collaborator Author

@XiaohongGong XiaohongGong Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for the suggestion! Like what you suggested for this PR, maybe we can revisit this part as well in future if it needs any enhancement!

@@ -87,7 +96,8 @@ Node* GraphKit::unbox_vector(Node* v, const TypeInstPtr* vbox_type, BasicType el
return NULL; // no nulls are allowed
}
assert(check_vbox(vbox_type), "");
const TypeVect* vt = TypeVect::make(elem_bt, num_elem);
const TypeVect* vt = is_vector_mask(vbox_type->klass()) ?
TypeVect::makemask(elem_bt, num_elem) : TypeVect::make(elem_bt, num_elem);
Node* unbox = gvn().transform(new VectorUnboxNode(C, vt, v, merged_memory(), shuffle_to_vector));
return unbox;
Copy link
Member

@jatin-bhateja jatin-bhateja May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Collaborator

@iwanowww iwanowww left a comment

Can you elaborate, please, what's the purpose of new nodes?

There's some duplication with existing vector nodes and I'd like to understand how do you intend to merge them.

LoadVectorMask/StoreVectorMask duplicate VectorLoadMask (LoadVector)/StoreVector (VectorStoreMask). What kind of benefits do you expect from exposing the operation as a single node?

Depending on how MaskToVector/VectorToMask are specified, they can duplicate VectorStoreMask/VectorLoadMask.

If mask value always has a vector type,AndVMask/OrVMask/XorVMask can be replaced by AndV/OrV/XorV and special implementations for 2 representations (canonical and native). Same considerations apply to VectorCmpMaskGen (compared to VectorMaskCmp).

What is left is MaskAll, but its purpose is not evident to me. Broadcast, but for masks?

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented May 17, 2021

Can you elaborate, please, what's the purpose of new nodes?

There's some duplication with existing vector nodes and I'd like to understand how do you intend to merge them.

Sure. I added some new nodes for the vector mask generations and operations. Their bottom type are the new added mask type TypeVectMask. These nodes are created only for platforms that support predicate feature. For other platforms, it still uses the existing vector nodes (with TypeVect as the bottom type) for masks. An alternative solution for the mask IRs is to use the same vector nodes by setting different bottom types for platforms that support predicate feature and others. To make it not mixture with the vector operations, we choose to separate the mask with vectors by adding new nodes.

LoadVectorMask/StoreVectorMask duplicate VectorLoadMask (LoadVector)/StoreVector (VectorStoreMask). What kind of benefits do you expect from exposing the operation as a single node?

Depending on how MaskToVector/VectorToMask are specified, they can duplicate VectorStoreMask/VectorLoadMask.

We added the transformation for mask loading/storing: VectorLoadMask (LoadVector)/StoreVector (VectorStoreMask) ==> LoadVectorMask/StoreVectorMask. To be honest, this seems only benefit Arm SVE. Different with AVX-512, SVE predicate works based on the basic element type. Since the memory type for mask is boolean, while the vector data type might be byte, short, int, long, float, double, we always need to unpack the boolean values to the relative data types when converting vector mask to predicate. So the SVE codes for VectorLoadMask (LoadVector) with int type is:

  mov x0, #64
  whilelo p0.b, zr, x0
  ld1b  z16.b, p0, [address]   // LoadVector  (since the mask vector length is less than the max vector length,
                               // we need the predicate to implement the partial loading)

  uunpklo z16.h, z16.h
  uunpklo z16.s, z16.s         // VectorLoadMask
  cmpne  p0.s, z16.s, #0       // VectorToMask

Since SVE supports the load extended instruction, we can use one node LoadVectorMask to load the mask values from memory and extend the values to the data type. The above codes could be optimized to:

  ld1b  z16.s, p7, [address]   // LoadVectorMask  (load mask values from boolean type memory, and extends the values to int type)
  cmpne p0.s, z16.s, #0        // VectorToMask

Note that MaskToVector/VectorToMask are different with VectorStoreMask/VectorLoadMask, which only simply do the convertions between mask and vector with the same element type, while VectorStoreMask/VectorLoadMask must contain the type casting between mask and vector which is also needed for SVE.

Although the existing pattern can work well for AVX-512. Is it possible to use the same optimized pattern for it as well @jatin-bhateja ?

If mask value always has a vector type,AndVMask/OrVMask/XorVMask can be replaced by AndV/OrV/XorV and special implementations for 2 representations (canonical and native). Same considerations apply to VectorCmpMaskGen (compared to VectorMaskCmp).

We added AndVMask/OrVMask/XorVMask for mask logical operations finally to separate with the vector logical operations. It might be confusing if AndV/OrV/XorV could might be TypeVect or TypeMaskVect for the same plaftform.

As a conclusion, actually we are not sure what is the best optimal way to represent the mask operations, but we prefer to new mask IRs (definitely with TypeVectMask) for all mask operations while not just change the bottom type of the existing vector nodes. Any ideas about it? It will be absolutely helpful if you could give us more advice. Thanks so much!

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jun 18, 2021

Hi @iwanowww @jatin-bhateja ,

Regarding to the mask IRs for SVE/AVX-512, it's ok for me to reuse most of the existing IRs both for SVE/AVX-512 and other platforms. I agree that it can work well by defining different types for different platforms (i.e. TypeVectMask for SVE/AVX-512 while TypeVect for others). The benefit is that it doesn't need to add more new IRs. And the compiler can define the right register for vector mask according to its bottom type. So some mask related IRs (VectorLoadMask, VectorMaskCmp, VectorMaskCast) can be reused.

However, to separate the vector and vector mask operations for SVE/AVX-512, we think it's better to define new IRs for vector mask for nodes that could be both vector and vector mask like (AndV/OrV/XorV/Replicate). So the following new IRs will be added for mask:

AndVMask, OrVMask, XorVMask, MaskAll

These IRs can also be used for other platforms that do not support predicate features.

Further more, considering the performance of vector mask loading/storing with SVE, the following optimizations are also needed:

   fromArray:  LoadVector + VectorLoadMask    ==>   LoadVectorMask     (load + type extending)                     
   intoArray:  VectorStoreMask + StoreVector   ==>   StoreVectorMask     (type narrowing + store)

Does this solution make sense to you? Any feedback from you are helpful and welcome, and hope we can have a conclusion for the mask IR definition part. Thanks so much!

Best Regards,
Xiaohong

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jun 21, 2021

Hi there, the new commit refines the C2 compiler IRs for vector mask. The main idea is trying to use the same nodes but with
different types (i.e. use "TypeVectMask" for SVE/AVX-512, while "TypeVect" for others). It doesn't need to create new IRs specially for SVE/AVX-512 predicate type. The "idea_reg" is decided by the type of the node, so that the compiler can choose the right register for vector mask during codegen.

For more details, the following existed IRs are kept to use normally:

 - fromArray: LoadVector + VectorLoadMask
 - compare:   VectorMaskCmp
 - others:    VectorMaskCast

And to spearate the IRs for vector and vector mask, the following mask IRs are added:

 - maskAll:    MaskAll
 - and/or/xor: AndVMask/OrVMask/XorVMask

Further more, to improve the performance of vector mask loading/storing with SVE, the following transformations are also kept:

 - fromArray:
     LoadVector + VectorLoadMask    ==>  LoadVectorMask
 - intoArray:
     VectorStoreMask + StoreVector  ==>  StoreVectorMask

Any feedback is welcome! Thanks so much!

Best Regards,
Xiaohong

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Jun 24, 2021

Can you elaborate, please, what's the purpose of new nodes?
There's some duplication with existing vector nodes and I'd like to understand how do you intend to merge them.

Sure. I added some new nodes for the vector mask generations and operations. Their bottom type are the new added mask type TypeVectMask. These nodes are created only for platforms that support predicate feature. For other platforms, it still uses the existing vector nodes (with TypeVect as the bottom type) for masks. An alternative solution for the mask IRs is to use the same vector nodes by setting different bottom types for platforms that support predicate feature and others. To make it not mixture with the vector operations, we choose to separate the mask with vectors by adding new nodes.

LoadVectorMask/StoreVectorMask duplicate VectorLoadMask (LoadVector)/StoreVector (VectorStoreMask). What kind of benefits do you expect from exposing the operation as a single node?
Depending on how MaskToVector/VectorToMask are specified, they can duplicate VectorStoreMask/VectorLoadMask.

We added the transformation for mask loading/storing: VectorLoadMask (LoadVector)/StoreVector (VectorStoreMask) ==> LoadVectorMask/StoreVectorMask. To be honest, this seems only benefit Arm SVE. Different with AVX-512, SVE predicate works based on the basic element type. Since the memory type for mask is boolean, while the vector data type might be byte, short, int, long, float, double, we always need to unpack the boolean values to the relative data types when converting vector mask to predicate. So the SVE codes for VectorLoadMask (LoadVector) with int type is:

  mov x0, #64
  whilelo p0.b, zr, x0
  ld1b  z16.b, p0, [address]   // LoadVector  (since the mask vector length is less than the max vector length,
                               // we need the predicate to implement the partial loading)

  uunpklo z16.h, z16.h
  uunpklo z16.s, z16.s         // VectorLoadMask
  cmpne  p0.s, z16.s, #0       // VectorToMask

Since SVE supports the load extended instruction, we can use one node LoadVectorMask to load the mask values from memory and extend the values to the data type. The above codes could be optimized to:

  ld1b  z16.s, p7, [address]   // LoadVectorMask  (load mask values from boolean type memory, and extends the values to int type)
  cmpne p0.s, z16.s, #0        // VectorToMask

Note that MaskToVector/VectorToMask are different with VectorStoreMask/VectorLoadMask, which only simply do the convertions between mask and vector with the same element type, while VectorStoreMask/VectorLoadMask must contain the type casting between mask and vector which is also needed for SVE.

Although the existing pattern can work well for AVX-512. Is it possible to use the same optimized pattern for it as well @jatin-bhateja ?

Hi @XiaohongGong , I just missed noticing this query earlier, apologies!, I think VectorLoadMask and VectorStoreMask serves the purpose to load/store a raw mask (boolean array) from/to a vector/predicated register. An explicit mask casting IR (VectorMaskCastNode) has been introduced recently which can be plugged after loading mask value if casting is needed.

As you mentioned there is a little use for LoadVectorMask/StoreVectorMask from X86 prespective.

If mask value always has a vector type,AndVMask/OrVMask/XorVMask can be replaced by AndV/OrV/XorV and special implementations for 2 representations (canonical and native). Same considerations apply to VectorCmpMaskGen (compared to VectorMaskCmp).

We added AndVMask/OrVMask/XorVMask for mask logical operations finally to separate with the vector logical operations. It might be confusing if AndV/OrV/XorV could might be TypeVect or TypeMaskVect for the same plaftform.

As a conclusion, actually we are not sure what is the best optimal way to represent the mask operations, but we prefer to new mask IRs (definitely with TypeVectMask) for all mask operations while not just change the bottom type of the existing vector nodes. Any ideas about it? It will be absolutely helpful if you could give us more advice. Thanks so much!

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jun 24, 2021

Hi @XiaohongGong , I just missed noticing this query earlier, apologies!, I think VectorLoadMask and VectorStoreMask serves the purpose to load/store a raw mask (boolean array) from/to a vector/predicated register. An explicit mask casting IR (VectorMaskCastNode) has been introduced recently which can be plugged after loading mask value if casting is needed.

It's ok. I think it's ok to use VectorMaskCastNode (actually we think this node is more reasonable instead of VectorLoadMask/VectorStoreMask). It needs a further consideration for it since we'd better to revisit its usage for all platforms while not only SVE.

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Jun 30, 2021

Hi @XiaohongGong , I think there are still few opens like StoreVectorMask/LoadVectorMask which may be specific for ARM or creation of different nodes for logic mask operation (OrVMask/AndVMask/XorVMask) etc, since we are anyways working on a forked branch from vectorIntrinsics any further tuning can be done later and it will give us an opportunity to start the backend work using common IR.

Hi @PaulSandoz , @sviswa7 , what are your thoughts.

@sviswa7
Copy link
Collaborator

@sviswa7 sviswa7 commented Jun 30, 2021

@XiaohongGong Please go ahead and integrate the code so we can proceed with the work.
It will be good to file JBS entries to revisit StoreVectorMask/LoadVectorMask and OrVMask/AndVMask/XorVMask creation.

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented Jun 30, 2021

Hi @PaulSandoz , @sviswa7 , what are your thoughts.

It's ok if its not perfect (most importantly it has to build), if there is rough consensus via approval lets iterate on the branch and make incremental improvements.

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jul 1, 2021

@XiaohongGong Please go ahead and integrate the code so we can proceed with the work.
It will be good to file JBS entries to revisit StoreVectorMask/LoadVectorMask and OrVMask/AndVMask/XorVMask creation.

OK. Agree! I will do a merge since there are some conflicts again. Thanks!

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jul 1, 2021

@PaulSandoz @sviswa7 @jatin-bhateja , I'v rebased this PR again and resolved the conflicts. Could you please give an approve if it can be merged? Thanks so much!

sviswa7
sviswa7 approved these changes Jul 2, 2021
@sviswa7
Copy link
Collaborator

@sviswa7 sviswa7 commented Jul 2, 2021

@XiaohongGong Please go ahead and merge.

@openjdk
Copy link

@openjdk openjdk bot commented Jul 2, 2021

@XiaohongGong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8266287: Basic mask IR implementation for the Vector API masking feature support

Reviewed-by: sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the vectorIntrinsics+mask branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the vectorIntrinsics+mask branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jul 2, 2021
@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jul 2, 2021

Thanks for the review @jatin-bhateja @sviswa7 @iwanowww @PaulSandoz !

@XiaohongGong
Copy link
Collaborator Author

@XiaohongGong XiaohongGong commented Jul 2, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Jul 2, 2021

Going to push as commit e53ea26.

@openjdk openjdk bot closed this Jul 2, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Jul 2, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jul 2, 2021

@XiaohongGong Pushed as commit e53ea26.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@XiaohongGong XiaohongGong deleted the JDK-8266287 branch Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants