Skip to content

Conversation

@XiaohongGong
Copy link

@XiaohongGong XiaohongGong commented Sep 25, 2025

The current implementations of VectorMask.fromLong() and toLong() on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of fromLong/toLong are defined as masks with predicate registers on SVE architectures.

For toLong(), the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For fromLong(), the opposite conversion is needed at the start of codegen.

These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures.

This patch optimizes the implementation by leveraging two existing C2 IRs (VectorLoadMask/VectorStoreMask) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations.

It also modifies the Vector API jtreg tests for well testing. Here is the details:

  1. Fix the smoke tests of fromLong/toLong to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for fromLong and toLong are optimized out completely by compiler due to following IR identity:
  VectorMaskToLong (VectorLongToMask l) => l

Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2.

  1. Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. fromLong requires "svebitperm" instead of "sve2".

Performance shows significant improvement on NVIDIA's Grace CPU.

Here is the performance data with -XX:UseSVE=2:

Benchmark                                   bits inputs Mode   Unit     Before       After    Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/ms  322151.976  1318576.736 4.09
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/ms  322187.144  1315736.931 4.08
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/ms  322213.330  1353272.882 4.19
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/ms 1009426.292  1339834.833 1.32
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/ms 1010311.371  1368379.465 1.35
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/ms 1013333.729  1368077.534 1.35
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/ms  892649.449  1301954.698 1.45
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/ms  894593.615  1324922.719 1.48
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/ms  884498.938  1289828.319 1.45
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/ms 1093444.011  1374164.132 1.25
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/ms 1080117.255  1369234.390 1.26
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/ms 1076327.072  1373219.435 1.27

And here is the performance data with -XX:UseSVE=1:

Benchmark                                   bits inputs Mode   Unit   Before        After     Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/ms 686584.179   800329.010  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/ms 686184.083   801754.893  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/ms 686426.883   799058.199  1.16
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/ms 945359.331  1179824.693  1.24
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/ms 946546.502  1169208.723  1.23
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/ms 943207.037  1176056.895  1.24
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/ms 874121.577  1179473.834  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/ms 881023.640  1180854.086  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/ms 880149.334  1160048.226  1.31
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/ms 938451.594  1164668.529  1.24
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/ms 939189.649  1187096.328  1.26
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/ms 938601.147  1181154.558  1.25

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE (Enhancement - P4)

Reviewers

Reviewers without OpenJDK IDs

  • @erifan (no known openjdk.org user name / role) 🔄 Re-review required (review applies to 25538369)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481
$ git checkout pull/27481

Update a local copy of the PR:
$ git checkout pull/27481
$ git pull https://git.openjdk.org/jdk.git pull/27481/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27481

View PR using the GUI difftool:
$ git pr show -t 27481

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27481.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 25, 2025

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 25, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 25, 2025
@openjdk
Copy link

openjdk bot commented Sep 25, 2025

@XiaohongGong The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 25, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 25, 2025

Webrevs

@XiaohongGong
Copy link
Author

Hi, could anyone please help take a look at this PR? Thanks a lot in advance!

Copy link
Contributor

@shqking shqking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your work.

Copy link
Contributor

@erifan erifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reviewed internally.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a quick glance, and had some comments.

I'll run some testing, and review more fully after :)

@eme64
Copy link
Contributor

eme64 commented Oct 21, 2025

@XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state?

@XiaohongGong
Copy link
Author

@XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state?

Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues.

@eme64
Copy link
Contributor

eme64 commented Oct 22, 2025

@XiaohongGong Thanks for merging, running testing now :)

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed :)

Now I have some understanding questions ;)

Comment on lines 625 to 627
if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does VectorStoreMaskNode do exactly?
Could you maybe add some short comment above the class definition of VectorStoreMaskNode?

I'm guessing it turns a predicate into a packed vector, right?
If that is correct, then it would make more sense to check something like

Suggested change
if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}
if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the name VectorStoreMaskNode is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}

Is the function name vector_mask_must_be_packed fine to you? This looks smarter to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @XiaohongGong I am a bit confused with this condition here -

mask_vec->bottom_type()->isa_vectmask() == nullptr

So this means that mask_vec is not of type TypeVectMask right? Which means it is not a vector predicate/mask type? Then how can the VectorStoreMaskNode convert mask_vec predicate to a packed vector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops (VectorMaskOpNode) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations.

Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @PaulSandoz has a good idea for a better naming of VectorLoadMask and VectorStoreMask?

@XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to vectornode.hpp. I also see that TypeVectMask has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier.

Copy link
Author

@XiaohongGong XiaohongGong Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to record https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @PaulSandoz has a good idea for a better naming of VectorLoadMask and VectorStoreMask?

IIUC these nodes represent conversions or casts:

  • VectorLoadMask - converts a vector register of 8-bit lanes representing a mask to a platform-specific mask register
  • VectorStoreMask - converts a platform-specific mask register to a vector register of 8-bit lanes representing the mask

In theory we could model such conversations using VectorOperators as we do other conversions, which might hold some clues as to their names. There is already VectorMaskCastNode, but i believe that operates on the platform-specific mask register, casting between different vector species of the same length.

So perhaps we could rename to the following:

  • VectorLoadMask -> VectorCastB2MaskNode
  • VectorStoreMask -> VectorCastMask2BNode

Having a naming convention for the various mask representations might further help and influence those names:

  • BVectMask, vector register of 8-bit lanes representing the mask
  • NVectMask, vector register of N-bit lanes representing the mask; and
  • PVectMask, representing the platform-specific predicate/mask register, which might be the same as NVectMask on certain hardware.

Does that help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulSandoz That sounds like a great idea!

@XiaohongGong
Copy link
Author

Hi @fg1417 , @Bhavana-Kilambi could you please help take a look at this PR especially the backend changes? Thanks a lot!

Comment on lines 400 to 402
// These ops are implemented with predicate instructions if input
// mask is a predciate.
return vt->isa_vectmask() == nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had an assert above, what else that vt could be other than vectmask, it would help in understanding this logic here ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vt is one of the normal TypeVect (i.e. TypeVectA|S|D|X|Y|Z) based on the vector length in bytes like other vector nodes. It is a a kind of TypeVect on architectures that do not support predicate feature. The mask is represented as the same with a vector on those platforms.

@XiaohongGong
Copy link
Author

Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot!

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XiaohongGong Thanks for the updates. I left a few more comments.

And thanks for filing:
https://bugs.openjdk.org/browse/JDK-8370666
Are you planning on working on that, or do you know someone else?
I could try, but I'm less familiar with all the concepts, and would need a lot of help.

return false;
}

assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE");
assert(vt->isa_vectmask() != nullptr, "The mask type must be a TypeVectMask on SVE");

Hotspot style guide does not like implicit null/zero checks ;)

Comment on lines 625 to 627
if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulSandoz That sounds like a great idea!

@XiaohongGong
Copy link
Author

@XiaohongGong Thanks for the updates. I left a few more comments.

And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help.

Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?

@eme64
Copy link
Contributor

eme64 commented Oct 29, 2025

@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help.

Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?

That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :)

@XiaohongGong
Copy link
Author

@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help.

Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?

That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :)

OK, I will try my best starting with it a few weeks later. Thanks!

Comment on lines +336 to +340
// Identify if a vector mask operation prefers the input/output mask to be
// saved with a predicate type or not.
// - Return true if it prefers a predicate type (i.e. TypeVectMask).
// - Return false if it prefers a general vector type (i.e. TypeVectA to TypeVectZ).
static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks much clearer now, thanks for the updates :)

I'll have a look at the whole PR at a later point.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this patch looks reasonable, but I'm not a aarch64 or x64 specialist for these ops.

I think we have sufficient aarch64 specialists look at this already.

But I'd like to ping @sviswa7 and @jatin-bhateja to sanity check the x64 changes, IR rules etc :)

After approval from x64 folks, I can offer to do some internal testing :)

Comment on lines +6849 to +6850
// Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler.
long outputLong = vmask.not().toLong();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds a bit fragile. Is there something that would catch if it did ever get optimized away?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. But currently fromLong + toLong would be identified to a long input:

Node* VectorMaskToLongNode::Identity(PhaseGVN* phase) {
if (in(1)->Opcode() == Op_VectorLongToMask) {
return in(1)->in(1);
}
return this;
}

So the original tests cannot test these two APIs exactly. But as a smoke test, it was used to verify the correctness of java-level APIs instead of the hotspot intrinsification.

@XiaohongGong
Copy link
Author

XiaohongGong commented Oct 31, 2025

Hi @sviswa7, @jatin-bhateja , could you please help take a look at this PR especially the X86 changes? Thanks so much!

Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

6 participants