Skip to content

8380424: C2: Fix missing identity optimization for vector nodes#30529

Open
erifan wants to merge 24 commits into
openjdk:masterfrom
erifan:JDK-8380424-miss-identity-opt
Open

8380424: C2: Fix missing identity optimization for vector nodes#30529
erifan wants to merge 24 commits into
openjdk:masterfrom
erifan:JDK-8380424-miss-identity-opt

Conversation

@erifan
Copy link
Copy Markdown
Contributor

@erifan erifan commented Apr 1, 2026

Ideal and Identity optimizations require all input nodes of the IR pattern to be ready for the optimization to take effect. However, node generation in the incremental inlining phase is unordered, so sometimes downstream nodes in the IR pattern are generated before upstream nodes, causing Ideal or Identity optimizations to miss. If no subsequent process triggers the optimization again, the optimization misses forever.

Vector nodes (especially generated by VectorAPI) are often wrapped using VectorBoxNode during generation, and the existence of these box nodes and unbox nodes further hinders the matching of IR optimization patterns. The -XX:VerifyIterativeGVN option allows us to check which IGVN optimizations are missed; however, currently, the verification for Vector nodes is skipped. Enabling the Identity optimization check for vector nodes shows that many tests fail, as shown below.

jdk/incubator/vector/ByteVector128LoadStoreTests.java
jdk/incubator/vector/ByteVector256LoadStoreTests.java
jdk/incubator/vector/ByteVector512LoadStoreTests.java
jdk/incubator/vector/ByteVector64LoadStoreTests.java
jdk/incubator/vector/ByteVectorMaxLoadStoreTests.java
jdk/incubator/vector/ShortVector128LoadStoreTests.java
jdk/incubator/vector/ShortVector256LoadStoreTests.java
jdk/incubator/vector/ShortVector512LoadStoreTests.java
jdk/incubator/vector/ShortVector64LoadStoreTests.java
jdk/incubator/vector/ShortVectorMaxLoadStoreTests.java
jdk/incubator/vector/Vector512ConversionTests.java
jdk/incubator/vector/Vector64ConversionTests.java#id0
jdk/incubator/vector/VectorMaxConversionTests.java#id0

They are caused by the missed optimizations of AndVNode::Identity() and ShiftVNode::Identity(). And from JDK-8370863, we know that VectorStoreMaskNode::Identity() may miss as well.

To recover these potential missed optimizations, we need to trigger them again at appropriate points. Currently, a GVN optimization runs once during node generation, and if no subsequent changes are made, the node will not be added to the IGVN worklist to trigger IGVN optimization again. Therefore, the corresponding nodes need to be added to the IGVN worklist at appropriate points.

Many phases affect the shape of the node tree, but inlining and boxing have a particularly significant impact on vector nodes. After PhaseVector, inlining is complete, and vector boxing/unboxing has been eliminated. At this point, the node tree is fully materialized, with no additional interfering nodes. Therefore, this PR adds all nodes to the IGVN worklist at this point to recover potentially missed GVN optimizations.

However, this modification still cannot handle the situation after PhaseVector, so this PR also enhances the notification of multi-hop IR optimization patterns in add_users_of_use_to_worklist.

With this PR, the above test failures passed in 100 tests, so this PR enables identity optimization verification for vector nodes. We expect that with this PR, there will be very few cases of Vector identity optimization misses; if they do occur, we should fix them rather than skip them.

This PR does not enable Ideal optimization verification for vector nodes because the inputs of some commutative nodes may be swapped in Ideal, causing changes in the hash value, which could lead to verification failure.

We also found many test failures caused by the missing of ShenandoahLoadReferenceBarrierNode::Identity(). This PR skipped the identity verification of the ShenandoahLoadReferenceBarrierNode because it was not investigated in this PR.

This PR tested all tier1 to tier3 jtreg tests on aarch64 (sve, neon) and x64 (avx3, avx2) platforms using options -ea -esa -XX:-TieredCompilation -XX:CompileThreshold=100 -XX:VerifyIterativeGVN=1110, and repeated the test 100 times for the aforementioned error cases. All tests passed.



Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8380424: C2: Fix missing identity optimization for vector nodes (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30529/head:pull/30529
$ git checkout pull/30529

Update a local copy of the PR:
$ git checkout pull/30529
$ git pull https://git.openjdk.org/jdk.git pull/30529/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30529

View PR using the GUI difftool:
$ git pr show -t 30529

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30529.diff

Using Webrev

Link to Webrev Comment

…terns

`VectorMaskCastNode` is used to cast a vector mask from one type to
another type. The cast may be generated by calling the vector API `cast`
or generated by the compiler. For example, some vector mask operations
like `trueCount` require the input mask to be integer types, so for
floating point type masks, the compiler will cast the mask to the
corresponding integer type mask automatically before doing the mask
operation. This kind of cast is very common.

If the vector element size is not changed, the `VectorMaskCastNode`
don't generate code, otherwise code will be generated to extend or narrow
the mask. This IR node is not free no matter it generates code or not
because it may block some optimizations. For example:
1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))`
The middle `VectorMaskCast` prevented the following optimization:
`(VectorStoremask (VectorLoadMask x)) => (x)`
2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which
blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.

In these IR patterns, the value of the input `x` is not changed, so we
can safely do the optimization. But if the input value is changed, we
can't eliminate the cast.

The general idea of this PR is introducing an `uncast_mask` helper
function, which can be used to uncast a chain of `VectorMaskCastNode`,
like the existing `Node::uncast(bool)` function. The funtion returns
the first non `VectorMaskCastNode`.

The intended use case is when the IR pattern to be optimized may
contain one or more consecutive `VectorMaskCastNode` and this does not
affect the correctness of the optimization. Then this function can be
called to eliminate the `VectorMaskCastNode` chain.

Current optimizations related to `VectorMaskCastNode` include:
1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242.

This PR does the following optimizations:
1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)`
as `(VectorMaskCast (VectorMaskCast  ... (VectorMaskCast x))) => (x)`.
Because as long as types of the head and tail `VectorMaskCastNode` are
consistent, the optimization is correct.
2. Supports a new optimization pattern
`(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`.
Since the value before and after the pattern is a boolean vector, it
remains unchanged as long as the vector length remains the same, and
this is guranteed in the api level.

I conducted some simple research on different mask generation methods
and mask operations, and obtained the following table, which includes
some potential optimization opportunities that may use this `uncast_mask`
function.

```
mask_gen\op    toLong   anyTrue allTrue trueCount firstTrue lastTrue
compare        N/A      N/A     N/A     N/A       N/A       N/A
maskAll        TBI      TBI     TBI     TBI       TBI       TBI
fromLong       TBI      TBI     N/A     TBI       TBI       TBI

mask_gen\op    and      or      xor     andNot    not       laneIsSet
compare        N/A      N/A     N/A     N/A       TBI       N/A
maskAll        TBI      TBI     TBI     TBI       TBI       TBI
fromLong       N/A      N/A     N/A     N/A       TBI       TBI
```
`TBI` indicated that there may be potential optimizations here that
require further investigation.

Benchmarks:

On a Nvidia Grace machine with 128-bit SVE2:
```
Benchmark			Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	59.23	0.21	148.12	0.07	2.50
microMaskLoadCastStoreDouble128	ops/us	2.43	0.00	38.31	0.01	15.73
microMaskLoadCastStoreFloat128	ops/us	6.19	0.00	75.67	0.11	12.22
microMaskLoadCastStoreInt128	ops/us	6.19	0.00	75.67	0.03	12.22
microMaskLoadCastStoreLong128	ops/us	2.43	0.00	38.32	0.01	15.74
microMaskLoadCastStoreShort64	ops/us	28.89	0.02	75.60	0.09	2.62
```

On a Nvidia Grace machine with 128-bit NEON:
```
Benchmark			Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	75.75	0.19	149.74	0.08	1.98
microMaskLoadCastStoreDouble128	ops/us	8.71	0.03	38.71	0.05	4.44
microMaskLoadCastStoreFloat128	ops/us	24.05	0.03	76.49	0.05	3.18
microMaskLoadCastStoreInt128	ops/us	24.06	0.02	76.51	0.05	3.18
microMaskLoadCastStoreLong128	ops/us	8.72	0.01	38.71	0.02	4.44
microMaskLoadCastStoreShort64	ops/us	24.64	0.01	76.43	0.06	3.10
```

On an AMD EPYC 9124 16-Core Processor with AVX3:
```
Benchmark			Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	82.13	0.31	115.14	0.08	1.40
microMaskLoadCastStoreDouble128	ops/us	0.32	0.00	0.32	0.00	1.01
microMaskLoadCastStoreFloat128	ops/us	42.18	0.05	57.56	0.07	1.36
microMaskLoadCastStoreInt128	ops/us	42.19	0.01	57.53	0.08	1.36
microMaskLoadCastStoreLong128	ops/us	0.30	0.01	0.32	0.00	1.05
microMaskLoadCastStoreShort64	ops/us	42.18	0.05	57.59	0.01	1.37
```

On an AMD EPYC 9124 16-Core Processor with AVX2:
```
Benchmark			Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	73.53	0.20	114.98	0.03	1.56
microMaskLoadCastStoreDouble128	ops/us	0.29	0.01	0.30	0.00	1.00
microMaskLoadCastStoreFloat128	ops/us	30.78	0.14	57.50	0.01	1.87
microMaskLoadCastStoreInt128	ops/us	30.65	0.26	57.50	0.01	1.88
microMaskLoadCastStoreLong128	ops/us	0.30	0.00	0.30	0.00	0.99
microMaskLoadCastStoreShort64	ops/us	24.92	0.00	57.49	0.01	2.31
```

On an AMD EPYC 9124 16-Core Processor with AVX1:
```
Benchmark			Unit	Before	Error	After	Error	Uplift
microMaskLoadCastStoreByte64	ops/us	79.68	0.01	248.49	0.91	3.12
microMaskLoadCastStoreDouble128	ops/us	0.28	0.00	0.28	0.00	1.00
microMaskLoadCastStoreFloat128	ops/us	31.11	0.04	95.48	2.27	3.07
microMaskLoadCastStoreInt128	ops/us	31.10	0.03	99.94	1.87	3.21
microMaskLoadCastStoreLong128	ops/us	0.28	0.00	0.28	0.00	0.99
microMaskLoadCastStoreShort64	ops/us	31.11	0.02	94.97	2.30	3.05
```

This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64
environments, and two 512-bit x64 machines with various configurations,
including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests
passed.
Ideal and Identity optimizations require all input nodes of the IR
pattern to be ready for the optimization to take effect. However, node
generation in the incremental inlining phase is unordered, so sometimes
downstream nodes in the IR pattern are generated before upstream nodes,
causing Ideal or Identity optimizations to miss. If no subsequent
process triggers the optimization again, the optimization misses forever.

Vector nodes (especially generated by VectorAPI) are often wrapped using
`VectorBoxNode` during generation, and the existence of these box nodes
and unbox nodes further hinders the matching of IR optimization patterns.
The `-XX:VerifyIterativeGVN` option allows us to check which IGVN
optimizations are missed; however, currently, the verification for Vector
nodes is skipped. Enabling the Identity optimization check for vector
nodes shows that many tests fail, as shown below.
```
jdk/incubator/vector/ByteVector128LoadStoreTests.java
jdk/incubator/vector/ByteVector256LoadStoreTests.java
jdk/incubator/vector/ByteVector512LoadStoreTests.java
jdk/incubator/vector/ByteVector64LoadStoreTests.java
jdk/incubator/vector/ByteVectorMaxLoadStoreTests.java
jdk/incubator/vector/ShortVector128LoadStoreTests.java
jdk/incubator/vector/ShortVector256LoadStoreTests.java
jdk/incubator/vector/ShortVector512LoadStoreTests.java
jdk/incubator/vector/ShortVector64LoadStoreTests.java
jdk/incubator/vector/ShortVectorMaxLoadStoreTests.java
jdk/incubator/vector/Vector512ConversionTests.java
jdk/incubator/vector/Vector64ConversionTests.java#id0
jdk/incubator/vector/VectorMaxConversionTests.java#id0
```
They are caused by the missed optimizations of `AndVNode::Identity()`
and `ShiftVNode::Identity()`. And from JDK-8370863, we know that
`VectorStoreMaskNode::Identity()` may miss as well.

To recover these potential missed optimizations, we need to trigger them
again at appropriate points. Currently, a GVN optimization runs once
during node generation, and if no subsequent changes are made, the node
will not be added to the IGVN worklist to trigger IGVN optimization
again. Therefore, the corresponding nodes need to be added to the IGVN
worklist at appropriate points.

Many phases affect the shape of the node tree, but inlining and boxing
have a particularly significant impact on vector nodes. After
`PhaseVector`, inlining is complete, and vector boxing/unboxing has been
eliminated. At this point, the node tree is fully materialized, with no
additional interfering nodes. Therefore, this PR adds all nodes to the
IGVN worklist at this point to recover potentially missed GVN
optimizations.

However, this modification still cannot handle the situation after
`PhaseVector`, so this PR also enhances the notification of multi-hop
IR optimization patterns in `add_users_of_use_to_worklist`.

With this PR, the above test failures passed in 100 tests, so this PR
enables identity optimization verification for vector nodes. We expect
that with this PR, there will be very few cases of Vector identity
optimization misses; if they do occur, we should fix them rather than
skip them.

This PR does not enable `Ideal` optimization verification for vector
nodes because the inputs of some commutative nodes may be swapped in
`Ideal`, causing changes in the hash value, which could lead to
verification failure.

We also found many test failures caused by the missing of
`ShenandoahLoadReferenceBarrierNode::Identity()`. This PR skipped the
identity verification of the `ShenandoahLoadReferenceBarrierNode`
because it was not investigated in this PR.

This PR tested all tier1 to tier3 jtreg tests on aarch64 (sve, neon) and
x64 (avx3, avx2) platforms using options `-ea -esa -XX:-TieredCompilation
-XX:CompileThreshold=100 -XX:VerifyIterativeGVN=1110`, and repeated the
test 100 times for the aforementioned error cases. All tests passed.
@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Apr 1, 2026

👋 Welcome back erfang! A progress list of the required criteria for merging this PR into pr/28313 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 1, 2026

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 1, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 1, 2026

@erifan The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk Bot added the rfr Pull request is ready for review label Apr 1, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Apr 1, 2026

Webrevs

Copy link
Copy Markdown
Contributor

@galderz galderz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment thread src/hotspot/share/opto/vector.cpp Outdated
void PhaseVector::add_all_nodes_into_igvn_worklist() {
ResourceMark rm;
Unique_Node_List useful;
C->identify_useful_nodes(useful);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk that this produces a lot of nodes that are unrelated to what you are trying to achieve here? From what I can see identity_useful_node seems to add the entire graph of live nodes and that seems to me that it could be a lot. Maybe this works work well when the graph has a lot of vector nodes, but if the graph has a mix of scalar and vector nodes maybe this could get out of hand?

Another thing that is noticeable is that PhaseVector::do_cleanup calls PhaseRemoveUseless just before add_all_nodes_into_igvn_worklist is called, and PhaseRemoveUseless also invokes identify_useful_nodes and stores them into _useful. This is accessible via PhaseRemoveUseless::get_useful. Could you piggyback on that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good point.

I did a quick compile-time check on a Neoverse-V2 machine with the same case:

  // mul add int vd = va * vb + vc
  public static void testMulAddInt() {
    for (int i = 0; i < LENGTH; i += I_SPECIES.length()) {
      IntVector va = IntVector.fromArray(I_SPECIES, ia, i);
      IntVector vb = IntVector.fromArray(I_SPECIES, ib, i);
      IntVector vc = IntVector.fromArray(I_SPECIES, ic, i);
      va.mul(vb).add(vc).intoArray(ir, i);
    }
  }

  public static void main(String[] args) {
    for (int i = 0; i < 10001; i++) {
        testMulAddInt();
    }
  }

java -Xbatch -XX:-UseOnStackReplacement -XX:CompileCommand="compileonly,Test::test*" -XX:-TieredCompilation -XX:CompileCommand="print,Test::test*" -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation Test.java > assembly.s

I ran both jdk-master and this PR build 10 times and measured the C2 compile time for Test::testMulAddInt from hotspot_pid*.log. The averages were 38.3 ms on jdk-master and 37.6 ms with this PR, so I do not see a noticeable compile-time regression from this change.

build avg min max stdev
master 38.3 ms 37.0 ms 40.0 ms 0.82 ms
PR 37.6 ms 37.0 ms 39.0 ms 0.70 ms

Also, as Peter suggested earlier in #28313. "Adding all nodes might be the most reliable in the long-run. We may at some point have idealization rules that start at a scalar node, and traverse up to find vector nodes."

I also took your suggestion and now reuse PhaseRemoveUseless::get_useful() instead of recomputing the useful-node set.

import compiler.lib.ir_framework.*;
import jdk.incubator.vector.*;

public class VectorStoreMaskIdentityStressTest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this class, but should there be additional IR tests for AndV and ShiftV?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we enable identity check for vector nodes, with options -ea -esa -XX:-TieredCompilation -XX:CompileThreshold=100 -XX:VerifyIterativeGVN=1110, test files like jdk/incubator/vector/ByteVector128LoadStoreTests.java consistently fails; therefore I think there's no need to add any additional tests for AndV and ShiftV.

For this test, it fails intermittently, so I feel it is necessary to add a stress test.

@openjdk-notifier openjdk-notifier Bot changed the base branch from pr/28313 to master April 15, 2026 08:35
@openjdk-notifier
Copy link
Copy Markdown

The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout JDK-8380424-miss-identity-opt
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 15, 2026

@erifan this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8380424-miss-identity-opt
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk Bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Apr 15, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-compiler. This can be overridden with the /reviewers command.

@erifan
Copy link
Copy Markdown
Contributor Author

erifan commented Apr 17, 2026

/template append

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 17, 2026

@erifan The pull request template has been appended to the pull request body

@openjdk openjdk Bot added the rfr Pull request is ready for review label Apr 17, 2026
@openjdk openjdk Bot removed the merge-conflict Pull request has merge conflict with target branch label Apr 20, 2026
Copy link
Copy Markdown
Contributor Author

@erifan erifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!

Comment thread src/hotspot/share/opto/vector.cpp Outdated
void PhaseVector::add_all_nodes_into_igvn_worklist() {
ResourceMark rm;
Unique_Node_List useful;
C->identify_useful_nodes(useful);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good point.

I did a quick compile-time check on a Neoverse-V2 machine with the same case:

  // mul add int vd = va * vb + vc
  public static void testMulAddInt() {
    for (int i = 0; i < LENGTH; i += I_SPECIES.length()) {
      IntVector va = IntVector.fromArray(I_SPECIES, ia, i);
      IntVector vb = IntVector.fromArray(I_SPECIES, ib, i);
      IntVector vc = IntVector.fromArray(I_SPECIES, ic, i);
      va.mul(vb).add(vc).intoArray(ir, i);
    }
  }

  public static void main(String[] args) {
    for (int i = 0; i < 10001; i++) {
        testMulAddInt();
    }
  }

java -Xbatch -XX:-UseOnStackReplacement -XX:CompileCommand="compileonly,Test::test*" -XX:-TieredCompilation -XX:CompileCommand="print,Test::test*" -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation Test.java > assembly.s

I ran both jdk-master and this PR build 10 times and measured the C2 compile time for Test::testMulAddInt from hotspot_pid*.log. The averages were 38.3 ms on jdk-master and 37.6 ms with this PR, so I do not see a noticeable compile-time regression from this change.

build avg min max stdev
master 38.3 ms 37.0 ms 40.0 ms 0.82 ms
PR 37.6 ms 37.0 ms 39.0 ms 0.70 ms

Also, as Peter suggested earlier in #28313. "Adding all nodes might be the most reliable in the long-run. We may at some point have idealization rules that start at a scalar node, and traverse up to find vector nodes."

I also took your suggestion and now reuse PhaseRemoveUseless::get_useful() instead of recomputing the useful-node set.

import compiler.lib.ir_framework.*;
import jdk.incubator.vector.*;

public class VectorStoreMaskIdentityStressTest {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we enable identity check for vector nodes, with options -ea -esa -XX:-TieredCompilation -XX:CompileThreshold=100 -XX:VerifyIterativeGVN=1110, test files like jdk/incubator/vector/ByteVector128LoadStoreTests.java consistently fails; therefore I think there's no need to add any additional tests for AndV and ShiftV.

For this test, it fails intermittently, so I feel it is necessary to add a stress test.

@erifan
Copy link
Copy Markdown
Contributor Author

erifan commented May 12, 2026

Hi @eme64 @galderz This PR aims to fix the missing identity optimizations for some vector nodes, would you mind taking look, thanks~

@eme64
Copy link
Copy Markdown
Contributor

eme64 commented May 12, 2026

@erifan Thanks for working on this! It is important to get the Vector API nicely optimized.

I'm currently extremely busy with high-priority tasks. I'll get back to this later.
Maybe @iwanowww can also help review here.

@erifan
Copy link
Copy Markdown
Contributor Author

erifan commented May 12, 2026

@erifan Thanks for working on this! It is important to get the Vector API nicely optimized.

I'm currently extremely busy with high-priority tasks. I'll get back to this later. Maybe @iwanowww can also help review here.

Ok, that's fine.
Hi @iwanowww , when you have a chance, could you please help review this PR? Thanks

@erifan
Copy link
Copy Markdown
Contributor Author

erifan commented May 21, 2026

Hello, could someone please help review this PR? It fixes some missing vector identity optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants