- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.1k
 
8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE #27481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| 
          
 👋 Welcome back xgong! A progress list of the required criteria for merging this PR into   | 
    
| 
           ❗ This change is not yet ready to be integrated.  | 
    
| 
           @XiaohongGong The following label will be automatically applied to this pull request: 
 When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.  | 
    
          
Webrevs
  | 
    
| 
           Hi, could anyone please help take a look at this PR? Thanks a lot in advance!  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, reviewed internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a quick glance, and had some comments.
I'll run some testing, and review more fully after :)
| 
           @XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state?  | 
    
          
 Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues.  | 
    
| 
           @XiaohongGong Thanks for merging, running testing now :)  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests passed :)
Now I have some understanding questions ;)
| if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { | ||
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does VectorStoreMaskNode do exactly?
Could you maybe add some short comment above the class definition of VectorStoreMaskNode?
I'm guessing it turns a predicate into a packed vector, right?
If that is correct, then it would make more sense to check something like
| if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { | |
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | |
| } | |
| if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { | |
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | |
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the name VectorStoreMaskNode is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}
Is the function name vector_mask_must_be_packed fine to you? This looks smarter to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @XiaohongGong I am a bit confused with this condition here -
mask_vec->bottom_type()->isa_vectmask() == nullptr
So this means that mask_vec is not of type TypeVectMask right? Which means it is not a vector predicate/mask type? Then how can the VectorStoreMaskNode convert mask_vec predicate to a packed vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops (VectorMaskOpNode) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations.
Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @PaulSandoz has a good idea for a better naming of VectorLoadMask and VectorStoreMask?
@XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to vectornode.hpp. I also see that TypeVectMask has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to record https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @PaulSandoz has a good idea for a better naming of
VectorLoadMaskandVectorStoreMask?
IIUC these nodes represent conversions or casts:
VectorLoadMask- converts a vector register of 8-bit lanes representing a mask to a platform-specific mask registerVectorStoreMask- converts a platform-specific mask register to a vector register of 8-bit lanes representing the mask
In theory we could model such conversations using VectorOperators as we do other conversions, which might hold some clues as to their names. There is already VectorMaskCastNode, but i believe that operates on the platform-specific mask register, casting between different vector species of the same length.
So perhaps we could rename to the following:
VectorLoadMask->VectorCastB2MaskNodeVectorStoreMask->VectorCastMask2BNode
Having a naming convention for the various mask representations might further help and influence those names:
BVectMask, vector register of 8-bit lanes representing the maskNVectMask, vector register of N-bit lanes representing the mask; andPVectMask, representing the platform-specific predicate/mask register, which might be the same asNVectMaskon certain hardware.
Does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PaulSandoz That sounds like a great idea!
| 
           Hi @fg1417 , @Bhavana-Kilambi could you please help take a look at this PR especially the backend changes? Thanks a lot!  | 
    
| // These ops are implemented with predicate instructions if input | ||
| // mask is a predciate. | ||
| return vt->isa_vectmask() == nullptr; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had an assert above, what else that vt could be other than vectmask, it would help in understanding this logic here ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vt is one of the normal TypeVect (i.e. TypeVectA|S|D|X|Y|Z) based on the vector length in bytes like other vector nodes. It is a a kind of TypeVect on architectures that do not support predicate feature. The mask is represented as the same with a vector on those platforms.
| 
           Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot!  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing:
https://bugs.openjdk.org/browse/JDK-8370666
Are you planning on working on that, or do you know someone else?
I could try, but I'm less familiar with all the concepts, and would need a lot of help.
| return false; | ||
| } | ||
| 
               | 
          ||
| assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE"); | |
| assert(vt->isa_vectmask() != nullptr, "The mask type must be a TypeVectMask on SVE"); | 
Hotspot style guide does not like implicit null/zero checks ;)
| if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { | ||
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PaulSandoz That sounds like a great idea!
          
 Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?  | 
    
          
 That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :)  | 
    
          
 OK, I will try my best starting with it a few weeks later. Thanks!  | 
    
| // Identify if a vector mask operation prefers the input/output mask to be | ||
| // saved with a predicate type or not. | ||
| // - Return true if it prefers a predicate type (i.e. TypeVectMask). | ||
| // - Return false if it prefers a general vector type (i.e. TypeVectA to TypeVectZ). | ||
| static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks much clearer now, thanks for the updates :)
I'll have a look at the whole PR at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this patch looks reasonable, but I'm not a aarch64 or x64 specialist for these ops.
I think we have sufficient aarch64 specialists look at this already.
But I'd like to ping @sviswa7 and @jatin-bhateja to sanity check the x64 changes, IR rules etc :)
After approval from x64 folks, I can offer to do some internal testing :)
| // Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler. | ||
| long outputLong = vmask.not().toLong(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds a bit fragile. Is there something that would catch if it did ever get optimized away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. But currently fromLong + toLong would be identified to a long input:
jdk/src/hotspot/share/opto/vectornode.cpp
Lines 1926 to 1931 in 6347f10
| Node* VectorMaskToLongNode::Identity(PhaseGVN* phase) { | |
| if (in(1)->Opcode() == Op_VectorLongToMask) { | |
| return in(1)->in(1); | |
| } | |
| return this; | |
| } | 
So the original tests cannot test these two APIs exactly. But as a smoke test, it was used to verify the correctness of java-level APIs instead of the hotspot intrinsification.
| 
           Hi @sviswa7, @jatin-bhateja , could you please help take a look at this PR especially the X86 changes? Thanks so much! Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance!  | 
    
The current implementations of
VectorMask.fromLong()andtoLong()on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input offromLong/toLongare defined as masks with predicate registers on SVE architectures.For
toLong(), the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. ForfromLong(), the opposite conversion is needed at the start of codegen.These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures.
This patch optimizes the implementation by leveraging two existing C2 IRs (
VectorLoadMask/VectorStoreMask) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations.It also modifies the Vector API jtreg tests for well testing. Here is the details:
fromLong/toLongto make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs forfromLongandtoLongare optimized out completely by compiler due to following IR identity:Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2.
fromLongrequires "svebitperm" instead of "sve2".Performance shows significant improvement on NVIDIA's Grace CPU.
Here is the performance data with
-XX:UseSVE=2:And here is the performance data with
-XX:UseSVE=1:Progress
Issue
Reviewers
Reviewers without OpenJDK IDs
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481$ git checkout pull/27481Update a local copy of the PR:
$ git checkout pull/27481$ git pull https://git.openjdk.org/jdk.git pull/27481/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27481View PR using the GUI difftool:
$ git pr show -t 27481Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27481.diff
Using Webrev
Link to Webrev Comment