-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE #27481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back xgong! A progress list of the required criteria for merging this PR into |
|
@XiaohongGong This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 315 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@XiaohongGong The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
|
Hi, could anyone please help take a look at this PR? Thanks a lot in advance! |
shqking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your work.
erifan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, reviewed internally.
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a quick glance, and had some comments.
I'll run some testing, and review more fully after :)
|
@XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state? |
Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues. |
|
@XiaohongGong Thanks for merging, running testing now :) |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests passed :)
Now I have some understanding questions ;)
| if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { | ||
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does VectorStoreMaskNode do exactly?
Could you maybe add some short comment above the class definition of VectorStoreMaskNode?
I'm guessing it turns a predicate into a packed vector, right?
If that is correct, then it would make more sense to check something like
| if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { | |
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | |
| } | |
| if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { | |
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the name VectorStoreMaskNode is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}
Is the function name vector_mask_must_be_packed fine to you? This looks smarter to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @XiaohongGong I am a bit confused with this condition here -
mask_vec->bottom_type()->isa_vectmask() == nullptr
So this means that mask_vec is not of type TypeVectMask right? Which means it is not a vector predicate/mask type? Then how can the VectorStoreMaskNode convert mask_vec predicate to a packed vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops (VectorMaskOpNode) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations.
Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @PaulSandoz has a good idea for a better naming of VectorLoadMask and VectorStoreMask?
@XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to vectornode.hpp. I also see that TypeVectMask has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to record https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @PaulSandoz has a good idea for a better naming of
VectorLoadMaskandVectorStoreMask?
IIUC these nodes represent conversions or casts:
VectorLoadMask- converts a vector register of 8-bit lanes representing a mask to a platform-specific mask registerVectorStoreMask- converts a platform-specific mask register to a vector register of 8-bit lanes representing the mask
In theory we could model such conversations using VectorOperators as we do other conversions, which might hold some clues as to their names. There is already VectorMaskCastNode, but i believe that operates on the platform-specific mask register, casting between different vector species of the same length.
So perhaps we could rename to the following:
VectorLoadMask->VectorCastB2MaskNodeVectorStoreMask->VectorCastMask2BNode
Having a naming convention for the various mask representations might further help and influence those names:
BVectMask, vector register of 8-bit lanes representing the maskNVectMask, vector register of N-bit lanes representing the mask; andPVectMask, representing the platform-specific predicate/mask register, which might be the same asNVectMaskon certain hardware.
Does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PaulSandoz That sounds like a great idea!
|
Hi @fg1417 , @Bhavana-Kilambi could you please help take a look at this PR especially the backend changes? Thanks a lot! |
|
Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot! |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing:
https://bugs.openjdk.org/browse/JDK-8370666
Are you planning on working on that, or do you know someone else?
I could try, but I'm less familiar with all the concepts, and would need a lot of help.
| if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { | ||
| mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PaulSandoz That sounds like a great idea!
Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you? |
That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :) |
OK, I will try my best starting with it a few weeks later. Thanks! |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this patch looks reasonable, but I'm not a aarch64 or x64 specialist for these ops.
I think we have sufficient aarch64 specialists look at this already.
But I'd like to ping @sviswa7 and @jatin-bhateja to sanity check the x64 changes, IR rules etc :)
After approval from x64 folks, I can offer to do some internal testing :)
| // Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler. | ||
| long outputLong = vmask.not().toLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds a bit fragile. Is there something that would catch if it did ever get optimized away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. But currently fromLong + toLong would be identified to a long input:
jdk/src/hotspot/share/opto/vectornode.cpp
Lines 1926 to 1931 in 6347f10
| Node* VectorMaskToLongNode::Identity(PhaseGVN* phase) { | |
| if (in(1)->Opcode() == Op_VectorLongToMask) { | |
| return in(1)->in(1); | |
| } | |
| return this; | |
| } |
So the original tests cannot test these two APIs exactly. But as a smoke test, it was used to verify the correctness of java-level APIs instead of the hotspot intrinsification.
|
Hi @sviswa7, @jatin-bhateja , could you please help take a look at this PR especially the X86 changes? Thanks so much! Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance! |
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Thanks so much for your review! Hi @eme64 , could you please help take an internal testing for this PR again? If there are any other inputs, please let me know. Thanks a lot! |
I recommend reverting changes to the smoke test, as i don't think that is sufficient. Since you already have IR tests in place we can follow up with a proper set of unit tests operating over long arrays like we do for boolean and with various mask operations. If you log the follow up issue i might be able to find someone else to write those tests. |
Sounds reasonable to me @PaulSandoz ! I will revert the smoke tests in next commit. And I'v filed another JBS as a follow-up for the unit tests: https://bugs.openjdk.org/browse/JDK-8371446. Thanks a lot! |
|
Hi @PaulSandoz , I'v reverted the smoke test changes in latest commit. Would you mind taking another look? Thanks! |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me now. Thanks for all the updates!
I'll run some internal testing before approving :)
Sounds good! Thanks so much for your testing! |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal tests pass (just sanity testing, did not run it on SVE). Code looks reasonable.
@XiaohongGong Thanks for all the updates and bearing with all the review comments 😊
Thanks for all your comments and testing. I also tested it with kinds of SVE environments locally. |
Tested the patch on AWS Graviton4 with the benchmarks provided, and the results match the reported numbers. With With |
|
/integrate |
|
Going to push as commit 676e6fd.
Your commit was automatically rebased without conflicts. |
|
@XiaohongGong Pushed as commit 676e6fd. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The current implementations of
VectorMask.fromLong()andtoLong()on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input offromLong/toLongare defined as masks with predicate registers on SVE architectures.For
toLong(), the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. ForfromLong(), the opposite conversion is needed at the start of codegen.These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures.
This patch optimizes the implementation by leveraging two existing C2 IRs (
VectorLoadMask/VectorStoreMask) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations.It also modifies the Vector API jtreg tests for well testing. Here is the details:
fromLong/toLongto make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs forfromLongandtoLongare optimized out completely by compiler due to following IR identity:Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2.
fromLongrequires "svebitperm" instead of "sve2".Performance shows significant improvement on NVIDIA's Grace CPU.
Here is the performance data with
-XX:UseSVE=2:And here is the performance data with
-XX:UseSVE=1:Progress
Issue
Reviewers
Reviewers without OpenJDK IDs
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481$ git checkout pull/27481Update a local copy of the PR:
$ git checkout pull/27481$ git pull https://git.openjdk.org/jdk.git pull/27481/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27481View PR using the GUI difftool:
$ git pr show -t 27481Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27481.diff
Using Webrev
Link to Webrev Comment