8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns #28313

erifan · 2025-11-14T01:17:50Z

VectorMaskCastNode is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API cast or generated by the compiler. For example, some vector mask operations like trueCount require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common.

If the vector element size is not changed, the VectorMaskCastNode don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example:

(VectorStoremask (VectorMaskCast (VectorLoadMask x))) The middle VectorMaskCast prevented the following optimization: (VectorStoremask (VectorLoadMask x)) => (x)
(VectorMaskToLong (VectorMaskCast (VectorLongToMask x))), which blocks the optimization (VectorMaskToLong (VectorLongToMask x)) => (x).

In these IR patterns, the value of the input x is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast.

The general idea of this PR is introducing an uncast_mask helper function, which can be used to uncast a chain of VectorMaskCastNode, like the existing Node::uncast(bool) function. The funtion returns the first non VectorMaskCastNode.

The intended use case is when the IR pattern to be optimized may contain one or more consecutive VectorMaskCastNode and this does not affect the correctness of the optimization. Then this function can be called to eliminate the VectorMaskCastNode chain.

Current optimizations related to VectorMaskCastNode include:

(VectorMaskCast (VectorMaskCast x)) => (x), see JDK-8356760.
(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)), see JDK-8354242.

This PR does the following optimizations:

Extends the optimization pattern (VectorMaskCast (VectorMaskCast x)) => (x) as (VectorMaskCast (VectorMaskCast ... (VectorMaskCast x))) => (x). Because as long as types of the head and tail VectorMaskCastNode are consistent, the optimization is correct.
Supports a new optimization pattern (VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x). Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level.

I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this uncast_mask function.

mask_gen/op	toLong	anyTrue	allTrue	trueCount	firstTrue	lastTrue	and	or	xor	andNot	not	laneIsSet
compare	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	TBI	N/A
maskAll	TBI	TBI	TBI	TBI	TBI	TBI	TBI	TBI	TBI	TBI	TBI	TBI
fromLong	TBI	TBI	N/A	TBI	TBI	TBI	N/A	N/A	N/A	N/A	TBI	TBI

TBI indicated that there may be potential optimizations here that require further investigation.

Benchmarks:

On a Nvidia Grace machine with 128-bit SVE2:

Benchmark			            Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	59.23	0.21	148.12	0.07	2.50
microMaskLoadCastStoreDouble128	ops/us	2.43	0.00	38.31	0.01	15.73
microMaskLoadCastStoreFloat128	ops/us	6.19	0.00	75.67	0.11	12.22
microMaskLoadCastStoreInt128	ops/us	6.19	0.00	75.67	0.03	12.22
microMaskLoadCastStoreLong128	ops/us	2.43	0.00	38.32	0.01	15.74
microMaskLoadCastStoreShort64	ops/us	28.89	0.02	75.60	0.09	2.62

On a Nvidia Grace machine with 128-bit NEON:

Benchmark			            Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	75.75	0.19	149.74	0.08	1.98
microMaskLoadCastStoreDouble128	ops/us	8.71	0.03	38.71	0.05	4.44
microMaskLoadCastStoreFloat128	ops/us	24.05	0.03	76.49	0.05	3.18
microMaskLoadCastStoreInt128	ops/us	24.06	0.02	76.51	0.05	3.18
microMaskLoadCastStoreLong128	ops/us	8.72	0.01	38.71	0.02	4.44
microMaskLoadCastStoreShort64	ops/us	24.64	0.01	76.43	0.06	3.10

On an AMD EPYC 9124 16-Core Processor with AVX3:

Benchmark			            Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	82.13	0.31	115.14	0.08	1.40
microMaskLoadCastStoreDouble128	ops/us	0.32	0.00	0.32	0.00	1.01
microMaskLoadCastStoreFloat128	ops/us	42.18	0.05	57.56	0.07	1.36
microMaskLoadCastStoreInt128	ops/us	42.19	0.01	57.53	0.08	1.36
microMaskLoadCastStoreLong128	ops/us	0.30	0.01	0.32	0.00	1.05
microMaskLoadCastStoreShort64	ops/us	42.18	0.05	57.59	0.01	1.37

On an AMD EPYC 9124 16-Core Processor with AVX2:

Benchmark			            Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	73.53	0.20	114.98	0.03	1.56
microMaskLoadCastStoreDouble128	ops/us	0.29	0.01	0.30	0.00	1.00
microMaskLoadCastStoreFloat128	ops/us	30.78	0.14	57.50	0.01	1.87
microMaskLoadCastStoreInt128	ops/us	30.65	0.26	57.50	0.01	1.88
microMaskLoadCastStoreLong128	ops/us	0.30	0.00	0.30	0.00	0.99
microMaskLoadCastStoreShort64	ops/us	24.92	0.00	57.49	0.01	2.31

On an AMD EPYC 9124 16-Core Processor with AVX1:

Benchmark			            Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	79.68	0.01	248.49	0.91	3.12
microMaskLoadCastStoreDouble128	ops/us	0.28	0.00	0.28	0.00	1.00
microMaskLoadCastStoreFloat128	ops/us	31.11	0.04	95.48	2.27	3.07
microMaskLoadCastStoreInt128	ops/us	31.10	0.03	99.94	1.87	3.21
microMaskLoadCastStoreLong128	ops/us	0.28	0.00	0.28	0.00	0.99
microMaskLoadCastStoreShort64	ops/us	31.11	0.02	94.97	2.30	3.05

This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines. With various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313
$ git checkout pull/28313

Update a local copy of the PR:
$ git checkout pull/28313
$ git pull https://git.openjdk.org/jdk.git pull/28313/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28313

View PR using the GUI difftool:
$ git pr show -t 28313

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28313.diff

Using Webrev

Link to Webrev Comment

…terns `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. Current optimizations related to `VectorMaskCastNode` include: 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. This PR does the following optimizations: 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level. I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this `uncast_mask` function. ``` mask_gen\op toLong anyTrue allTrue trueCount firstTrue lastTrue compare N/A N/A N/A N/A N/A N/A maskAll TBI TBI TBI TBI TBI TBI fromLong TBI TBI N/A TBI TBI TBI mask_gen\op and or xor andNot not laneIsSet compare N/A N/A N/A N/A TBI N/A maskAll TBI TBI TBI TBI TBI TBI fromLong N/A N/A N/A N/A TBI TBI ``` `TBI` indicated that there may be potential optimizations here that require further investigation. Benchmarks: On a Nvidia Grace machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 59.23 0.21 148.12 0.07 2.50 microMaskLoadCastStoreDouble128 ops/us 2.43 0.00 38.31 0.01 15.73 microMaskLoadCastStoreFloat128 ops/us 6.19 0.00 75.67 0.11 12.22 microMaskLoadCastStoreInt128 ops/us 6.19 0.00 75.67 0.03 12.22 microMaskLoadCastStoreLong128 ops/us 2.43 0.00 38.32 0.01 15.74 microMaskLoadCastStoreShort64 ops/us 28.89 0.02 75.60 0.09 2.62 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 75.75 0.19 149.74 0.08 1.98 microMaskLoadCastStoreDouble128 ops/us 8.71 0.03 38.71 0.05 4.44 microMaskLoadCastStoreFloat128 ops/us 24.05 0.03 76.49 0.05 3.18 microMaskLoadCastStoreInt128 ops/us 24.06 0.02 76.51 0.05 3.18 microMaskLoadCastStoreLong128 ops/us 8.72 0.01 38.71 0.02 4.44 microMaskLoadCastStoreShort64 ops/us 24.64 0.01 76.43 0.06 3.10 ``` On an AMD EPYC 9124 16-Core Processor with AVX3: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.13 0.31 115.14 0.08 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 42.18 0.05 57.56 0.07 1.36 microMaskLoadCastStoreInt128 ops/us 42.19 0.01 57.53 0.08 1.36 microMaskLoadCastStoreLong128 ops/us 0.30 0.01 0.32 0.00 1.05 microMaskLoadCastStoreShort64 ops/us 42.18 0.05 57.59 0.01 1.37 ``` On an AMD EPYC 9124 16-Core Processor with AVX2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.53 0.20 114.98 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.29 0.01 0.30 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 30.78 0.14 57.50 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.65 0.26 57.50 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.92 0.00 57.49 0.01 2.31 ``` On an AMD EPYC 9124 16-Core Processor with AVX1: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 79.68 0.01 248.49 0.91 3.12 microMaskLoadCastStoreDouble128 ops/us 0.28 0.00 0.28 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 31.11 0.04 95.48 2.27 3.07 microMaskLoadCastStoreInt128 ops/us 31.10 0.03 99.94 1.87 3.21 microMaskLoadCastStoreLong128 ops/us 0.28 0.00 0.28 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 31.11 0.02 94.97 2.30 3.05 ``` This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines with various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed.

bridgekeeper · 2025-11-14T01:18:59Z

👋 Welcome back erifan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-11-14T01:19:47Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-11-14T01:20:30Z

@erifan The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-11-14T01:26:02Z

Webrevs

erifan · 2025-11-20T03:59:27Z

Updated the JMH benchmarks and the new test results:

On a Nvidia Grace machine with 128-bit SVE2:

Benchmark                       Unit    Before  Error   After   Error   Uplift
microMaskLoadCastStoreByte64    ops/us  64.29   0.02    146.67  0.09    2.28
microMaskLoadCastStoreDouble128 ops/us  10.05   0.00    38.10   0.01    3.79
microMaskLoadCastStoreFloat128  ops/us  19.94   0.00    75.05   0.07    3.76
microMaskLoadCastStoreInt128    ops/us  19.94   0.00    75.13   0.01    3.77
microMaskLoadCastStoreLong128   ops/us  10.04   0.00    38.09   0.01    3.79
microMaskLoadCastStoreShort64   ops/us  31.52   0.02    75.12   0.02    2.38

On a Nvidia Grace machine with 128-bit NEON:

Benchmark                       Unit    Before  Error   After   Error   Uplift
microMaskLoadCastStoreByte64    ops/us  73.33   0.01    147.01  0.06    2.00
microMaskLoadCastStoreDouble128 ops/us  8.54    0.03    38.19   0.01    4.47
microMaskLoadCastStoreFloat128  ops/us  23.75   0.01    75.27   0.10    3.17
microMaskLoadCastStoreInt128    ops/us  23.73   0.01    75.25   0.07    3.17
microMaskLoadCastStoreLong128   ops/us  8.56    0.03    38.19   0.01    4.46
microMaskLoadCastStoreShort64   ops/us  24.32   0.00    75.35   0.07    3.10

On an AMD EPYC 9124 16-Core Processor with AVX3:

Benchmark                       Unit    Before  Error   After   Error   Uplift
microMaskLoadCastStoreByte64    ops/us  82.39   0.11    115.15  0.03    1.40
microMaskLoadCastStoreDouble128 ops/us  0.32    0.00    0.32    0.00    0.99
microMaskLoadCastStoreFloat128  ops/us  42.10   0.10    57.58   0.02    1.37
microMaskLoadCastStoreInt128    ops/us  42.10   0.08    57.57   0.02    1.37
microMaskLoadCastStoreLong128   ops/us  0.32    0.00    0.32    0.00    0.99
microMaskLoadCastStoreShort64   ops/us  42.16   0.05    57.54   0.04    1.36

On an AMD EPYC 9124 16-Core Processor with AVX2:

Benchmark                       Unit    Before  Error   After   Error   Uplift
microMaskLoadCastStoreByte64    ops/us  73.59   0.27    115.14  0.04    1.56
microMaskLoadCastStoreDouble128 ops/us  0.30    0.00    0.30    0.00    1.01
microMaskLoadCastStoreFloat128  ops/us  30.68   0.09    57.57   0.02    1.88
microMaskLoadCastStoreInt128    ops/us  30.75   0.09    57.58   0.01    1.87
microMaskLoadCastStoreLong128   ops/us  0.30    0.00    0.30    0.00    1.00
microMaskLoadCastStoreShort64   ops/us  24.95   0.01    57.59   0.01    2.31

On an AMD EPYC 9124 16-Core Processor with AVX1:

Benchmark                       Unit    Before  Error   After   Error   Uplift
microMaskLoadCastStoreByte64    ops/us  73.68   0.02    115.17  0.03    1.56
microMaskLoadCastStoreDouble128 ops/us  0.30    0.00    0.30    0.00    1.01
microMaskLoadCastStoreFloat128  ops/us  30.80   0.12    57.59   0.01    1.87
microMaskLoadCastStoreInt128    ops/us  30.70   0.11    57.58   0.01    1.88
microMaskLoadCastStoreLong128   ops/us  0.30    0.00    0.30    0.00    0.99
microMaskLoadCastStoreShort64   ops/us  24.95   0.01    57.56   0.02    2.31

galderz

Nice improvement @erifan, just some small comments from me

galderz · 2025-11-28T09:09:28Z

src/hotspot/share/opto/vectornode.cpp

+//   (VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)
+//   x remains to be a bool vector with no changes.
+// This function can be used to eliminate the VectorMaskCast in such patterns.
+Node* VectorNode::uncast_mask(Node* n) {


Could this be a static method instead?

Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141

Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~

galderz · 2025-11-28T09:12:31Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java

+    @IR(counts = { IRNode.VECTOR_MASK_CAST, "= 0" },
+        applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"})
    public static int testTwoCastToDifferentType2() {
        // The types before and after the two casts are not the same, so the cast cannot be eliminated.


Could you expand the documentation on the IR assertions? It's not immediately clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this comment is outdated.

This is because the following optimization on AVX2 affects this optimization:

(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => x

On AVX2 trueCount() requires converting the mask to a boolean vector first via VectorStoreMask. So VectorStoreMask can apply the above optimization, which eliminates all VectorMaskCast nodes as a side effect.

On AVX-512, masks use dedicated mask registers (k registers), VectorStoreMask is not generated for trueCount(), so VectorMaskCast nodes remain.

I reorganised this file, please take another look, thanks~

galderz · 2025-11-28T09:12:58Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java

+    @IR(counts = { IRNode.VECTOR_MASK_CAST, "= 0" },
+        applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"})
    public static int testTwoCastToDifferentType() {
        // The types before and after the two casts are not the same, so the cast cannot be eliminated.


Outdated comment. Also please expand assertion comments

Done, thanks!

galderz · 2025-11-28T09:14:50Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java

    }

    @Test
+    @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",


Could you add some assertion comments here as well to understand what causes the differences with different architectures?

galderz · 2025-11-28T09:14:57Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java

    }

    @Test
+    @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",


erifan

Thanks for your review! @galderz

erifan · 2025-12-04T02:23:40Z

src/hotspot/share/opto/vectornode.cpp

+//   (VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)
+//   x remains to be a bool vector with no changes.
+// This function can be used to eliminate the VectorMaskCast in such patterns.
+Node* VectorNode::uncast_mask(Node* n) {


Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141

Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~

erifan · 2025-12-04T02:42:05Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java

+    @IR(counts = { IRNode.VECTOR_MASK_CAST, "= 0" },
+        applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"})
    public static int testTwoCastToDifferentType() {
        // The types before and after the two casts are not the same, so the cast cannot be eliminated.


Done, thanks!

erifan · 2025-12-04T02:42:18Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java

+    @IR(counts = { IRNode.VECTOR_MASK_CAST, "= 0" },
+        applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"})
    public static int testTwoCastToDifferentType2() {
        // The types before and after the two casts are not the same, so the cast cannot be eliminated.


This is because the following optimization on AVX2 affects this optimization:

(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => x

On AVX2 trueCount() requires converting the mask to a boolean vector first via VectorStoreMask. So VectorStoreMask can apply the above optimization, which eliminates all VectorMaskCast nodes as a side effect.

On AVX-512, masks use dedicated mask registers (k registers), VectorStoreMask is not generated for trueCount(), so VectorMaskCast nodes remain.

I reorganised this file, please take another look, thanks~

erifan · 2025-12-04T02:42:34Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java

    }

    @Test
+    @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",


erifan · 2025-12-04T02:42:41Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java

    }

    @Test
+    @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",


openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Nov 14, 2025

openjdk bot added the rfr Pull request is ready for review label Nov 14, 2025

erifan added 2 commits November 20, 2025 03:52

Merge branch 'master' into JDK-8370863-mask-cast-opt

164fdef

Don't read and write the same memory in the JMH benchmarks

3b0ff7d

galderz suggested changes Nov 28, 2025

View reviewed changes

erifan added 2 commits December 5, 2025 08:05

Merge branch 'master' into JDK-8370863-mask-cast-opt

b4131a2

Refine the test code and comments

c04039c

erifan commented Dec 5, 2025

View reviewed changes

Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java

aa9a08a

8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns #28313

Are you sure you want to change the base?

8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns #28313

Conversation

erifan commented Nov 14, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Nov 14, 2025

Uh oh!

openjdk bot commented Nov 14, 2025

Uh oh!

openjdk bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

erifan commented Nov 20, 2025

Uh oh!

galderz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erifan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

erifan commented Nov 14, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Nov 14, 2025 •

edited

Loading

mlbridge bot commented Nov 14, 2025 •

edited

Loading