8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE #27481

XiaohongGong · 2025-09-25T03:08:47Z

The current implementations of VectorMask.fromLong() and toLong() on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of fromLong/toLong are defined as masks with predicate registers on SVE architectures.

For toLong(), the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For fromLong(), the opposite conversion is needed at the start of codegen.

These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures.

This patch optimizes the implementation by leveraging two existing C2 IRs (VectorLoadMask/VectorStoreMask) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations.

It also modifies the Vector API jtreg tests for well testing. Here is the details:

Fix the smoke tests of fromLong/toLong to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for fromLong and toLong are optimized out completely by compiler due to following IR identity:

  VectorMaskToLong (VectorLongToMask l) => l

Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2.

Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. fromLong requires "svebitperm" instead of "sve2".

Performance shows significant improvement on NVIDIA's Grace CPU.

Here is the performance data with -XX:UseSVE=2:

Benchmark                                   bits inputs Mode   Unit     Before       After    Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/ms  322151.976  1318576.736 4.09
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/ms  322187.144  1315736.931 4.08
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/ms  322213.330  1353272.882 4.19
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/ms 1009426.292  1339834.833 1.32
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/ms 1010311.371  1368379.465 1.35
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/ms 1013333.729  1368077.534 1.35
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/ms  892649.449  1301954.698 1.45
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/ms  894593.615  1324922.719 1.48
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/ms  884498.938  1289828.319 1.45
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/ms 1093444.011  1374164.132 1.25
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/ms 1080117.255  1369234.390 1.26
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/ms 1076327.072  1373219.435 1.27

And here is the performance data with -XX:UseSVE=1:

Benchmark                                   bits inputs Mode   Unit   Before        After     Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/ms 686584.179   800329.010  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/ms 686184.083   801754.893  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/ms 686426.883   799058.199  1.16
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/ms 945359.331  1179824.693  1.24
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/ms 946546.502  1169208.723  1.23
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/ms 943207.037  1176056.895  1.24
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/ms 874121.577  1179473.834  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/ms 881023.640  1180854.086  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/ms 880149.334  1160048.226  1.31
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/ms 938451.594  1164668.529  1.24
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/ms 939189.649  1187096.328  1.26
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/ms 938601.147  1181154.558  1.25

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE (Enhancement - P4)

Reviewers

Hao Sun (@shqking - Committer) 🔄 Re-review required (review applies to 25538369)

Reviewers without OpenJDK IDs

@erifan (no known openjdk.org user name / role) 🔄 Re-review required (review applies to 25538369)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481
$ git checkout pull/27481

Update a local copy of the PR:
$ git checkout pull/27481
$ git pull https://git.openjdk.org/jdk.git pull/27481/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27481

View PR using the GUI difftool:
$ git pr show -t 27481

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27481.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-09-25T03:10:12Z

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-09-25T03:11:44Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-09-25T03:12:25Z

@XiaohongGong The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-09-25T03:16:21Z

Webrevs

XiaohongGong · 2025-10-10T03:26:52Z

Hi, could anyone please help take a look at this PR? Thanks a lot in advance!

shqking

LGTM. Thanks for your work.

erifan

LGTM, reviewed internally.

eme64

I gave it a quick glance, and had some comments.

I'll run some testing, and review more fully after :)

src/hotspot/cpu/aarch64/aarch64_vector.ad

src/hotspot/cpu/riscv/riscv_v.ad

eme64 · 2025-10-21T12:52:03Z

@XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state?

XiaohongGong · 2025-10-22T04:11:30Z

@XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state?

Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues.

eme64 · 2025-10-22T06:27:26Z

@XiaohongGong Thanks for merging, running testing now :)

eme64

Tests passed :)

Now I have some understanding questions ;)

src/hotspot/cpu/aarch64/aarch64_vector.ad

eme64 · 2025-10-23T05:56:05Z

src/hotspot/share/opto/vectorIntrinsics.cpp

+  if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) {
    mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
  }


What does VectorStoreMaskNode do exactly?
Could you maybe add some short comment above the class definition of VectorStoreMaskNode?

I'm guessing it turns a predicate into a packed vector, right?
If that is correct, then it would make more sense to check something like

Suggested change

if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) {

mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));

}

if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) {

mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));

}

I'm wondering if the name VectorStoreMaskNode is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion?

VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array.

if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) {
mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
}

Is the function name vector_mask_must_be_packed fine to you? This looks smarter to me.

Hi @XiaohongGong I am a bit confused with this condition here -

mask_vec->bottom_type()->isa_vectmask() == nullptr

So this means that mask_vec is not of type TypeVectMask right? Which means it is not a vector predicate/mask type? Then how can the VectorStoreMaskNode convert mask_vec predicate to a packed vector?

Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops (VectorMaskOpNode) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations.

Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false?

Maybe @PaulSandoz has a good idea for a better naming of VectorLoadMask and VectorStoreMask?

@XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to vectornode.hpp. I also see that TypeVectMask has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier.

Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to record https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion!

Maybe @PaulSandoz has a good idea for a better naming of VectorLoadMask and VectorStoreMask?

IIUC these nodes represent conversions or casts:

VectorLoadMask - converts a vector register of 8-bit lanes representing a mask to a platform-specific mask register

VectorStoreMask - converts a platform-specific mask register to a vector register of 8-bit lanes representing the mask

In theory we could model such conversations using VectorOperators as we do other conversions, which might hold some clues as to their names. There is already VectorMaskCastNode, but i believe that operates on the platform-specific mask register, casting between different vector species of the same length.

So perhaps we could rename to the following:

VectorLoadMask -> VectorCastB2MaskNode

VectorStoreMask -> VectorCastMask2BNode

Having a naming convention for the various mask representations might further help and influence those names:

BVectMask, vector register of 8-bit lanes representing the mask

NVectMask, vector register of N-bit lanes representing the mask; and

PVectMask, representing the platform-specific predicate/mask register, which might be the same as NVectMask on certain hardware.

Does that help?

@PaulSandoz That sounds like a great idea!

XiaohongGong · 2025-10-23T07:32:51Z

Hi @fg1417 , @Bhavana-Kilambi could you please help take a look at this PR especially the backend changes? Thanks a lot!

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp

src/hotspot/cpu/aarch64/aarch64_vector.ad

eme64 · 2025-10-24T07:25:55Z

src/hotspot/cpu/aarch64/aarch64_vector.ad

+        // These ops are implemented with predicate instructions if input
+        // mask is a predciate.
+        return vt->isa_vectmask() == nullptr;


If we had an assert above, what else that vt could be other than vectmask, it would help in understanding this logic here ;)

vt is one of the normal TypeVect (i.e. TypeVectA|S|D|X|Y|Z) based on the vector length in bytes like other vector nodes. It is a a kind of TypeVect on architectures that do not support predicate feature. The mask is represented as the same with a vector on those platforms.

src/hotspot/share/opto/vectorIntrinsics.cpp

more comments

XiaohongGong · 2025-10-28T05:54:02Z

Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot!

eme64

@XiaohongGong Thanks for the updates. I left a few more comments.

And thanks for filing:
https://bugs.openjdk.org/browse/JDK-8370666
Are you planning on working on that, or do you know someone else?
I could try, but I'm less familiar with all the concepts, and would need a lot of help.

eme64 · 2025-10-28T09:46:30Z

src/hotspot/cpu/aarch64/aarch64_vector.ad

+      return false;
+    }
+
+    assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE");


Suggested change

assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE");

assert(vt->isa_vectmask() != nullptr, "The mask type must be a TypeVectMask on SVE");

Hotspot style guide does not like implicit null/zero checks ;)

eme64 · 2025-10-28T09:50:43Z

src/hotspot/share/opto/vectorIntrinsics.cpp

+  if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) {
    mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
  }


@PaulSandoz That sounds like a great idea!

src/hotspot/share/opto/matcher.hpp

XiaohongGong · 2025-10-29T07:55:45Z

@XiaohongGong Thanks for the updates. I left a few more comments.

And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help.

Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?

eme64 · 2025-10-29T08:19:47Z

@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help.

Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?

That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :)

XiaohongGong · 2025-10-29T08:24:49Z

@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help.

Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3–4 weeks. Would that be okay with you?

That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :)

OK, I will try my best starting with it a few weeks later. Thanks!

eme64 · 2025-10-30T07:19:24Z

src/hotspot/share/opto/matcher.hpp

+  // Identify if a vector mask operation prefers the input/output mask to be
+  // saved with a predicate type or not.
+  // - Return true if it prefers a predicate type (i.e. TypeVectMask).
+  // - Return false if it prefers a general vector type (i.e. TypeVectA to TypeVectZ).
+  static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt);


Nice, this looks much clearer now, thanks for the updates :)

I'll have a look at the whole PR at a later point.

eme64

Generally, this patch looks reasonable, but I'm not a aarch64 or x64 specialist for these ops.

I think we have sufficient aarch64 specialists look at this already.

But I'd like to ping @sviswa7 and @jatin-bhateja to sanity check the x64 changes, IR rules etc :)

After approval from x64 folks, I can offer to do some internal testing :)

eme64 · 2025-10-30T07:27:43Z

test/jdk/jdk/incubator/vector/Long128VectorTests.java

+            // Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler.
+            long outputLong = vmask.not().toLong();


That sounds a bit fragile. Is there something that would catch if it did ever get optimized away?

I'm not sure. But currently fromLong + toLong would be identified to a long input:

jdk/src/hotspot/share/opto/vectornode.cpp

Lines 1926 to 1931 in 6347f10

Node* VectorMaskToLongNode::Identity(PhaseGVN* phase) {

if (in(1)->Opcode() == Op_VectorLongToMask) {

return in(1)->in(1);

}

return this;

}

So the original tests cannot test these two APIs exactly. But as a smoke test, it was used to verify the correctness of java-level APIs instead of the hotspot intrinsification.

XiaohongGong · 2025-10-31T01:40:13Z

Hi @sviswa7, @jatin-bhateja , could you please help take a look at this PR especially the X86 changes? Thanks so much!

Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance!

8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE

2553836

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 25, 2025

openjdk bot added the rfr Pull request is ready for review label Sep 25, 2025

shqking approved these changes Oct 21, 2025

View reviewed changes

erifan approved these changes Oct 21, 2025

View reviewed changes

eme64 reviewed Oct 21, 2025

View reviewed changes

src/hotspot/cpu/aarch64/aarch64_vector.ad Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved

XiaohongGong added 2 commits October 22, 2025 02:50

Merge 'jdk:master' into JDK-8367292

2c8a3a9

Move function comments to matcher.hpp

d3e5b0f

eme64 reviewed Oct 23, 2025

View reviewed changes

Bhavana-Kilambi reviewed Oct 23, 2025

View reviewed changes

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp Outdated Show resolved Hide resolved

Rename the matcher function and fix comment issue

612c612

eme64 reviewed Oct 24, 2025

View reviewed changes

Bhavana-Kilambi reviewed Oct 24, 2025

View reviewed changes

src/hotspot/share/opto/vectorIntrinsics.cpp Outdated Show resolved Hide resolved

Rename matcher helper function to "mask_op_prefers_predicate" and add

3a40fc2

more comments

eme64 reviewed Oct 28, 2025

View reviewed changes

Update comments

40c2df0

eme64 reviewed Oct 30, 2025

View reviewed changes

	assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE");
	assert(vt->isa_vectmask() != nullptr, "The mask type must be a TypeVectMask on SVE");

		// Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler.
		long outputLong = vmask.not().toLong();

	Node* VectorMaskToLongNode::Identity(PhaseGVN* phase) {
	if (in(1)->Opcode() == Op_VectorLongToMask) {
	return in(1)->in(1);
	}
	return this;
	}

8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE #27481

Are you sure you want to change the base?

8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE #27481

Uh oh!

Conversation

XiaohongGong commented Sep 25, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewers without OpenJDK IDs

Reviewing

Uh oh!

bridgekeeper bot commented Sep 25, 2025

Uh oh!

openjdk bot commented Sep 25, 2025

Uh oh!

openjdk bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

XiaohongGong commented Oct 10, 2025

Uh oh!

shqking left a comment

Choose a reason for hiding this comment

Uh oh!

erifan left a comment

Choose a reason for hiding this comment

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eme64 commented Oct 21, 2025

Uh oh!

XiaohongGong commented Oct 22, 2025

Uh oh!

eme64 commented Oct 22, 2025

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XiaohongGong Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XiaohongGong commented Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

XiaohongGong commented Oct 28, 2025

Uh oh!

XiaohongGong commented Sep 25, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Sep 25, 2025 •

edited

Loading

mlbridge bot commented Sep 25, 2025 •

edited

Loading

XiaohongGong Oct 27, 2025 •

edited

Loading

XiaohongGong commented Oct 31, 2025 •

edited

Loading