Skip to content

8342095: Add autovectorizer support for subword vector casts#23413

Open
jaskarth wants to merge 22 commits intoopenjdk:masterfrom
jaskarth:vectorize-subword
Open

8342095: Add autovectorizer support for subword vector casts#23413
jaskarth wants to merge 22 commits intoopenjdk:masterfrom
jaskarth:vectorize-subword

Conversation

@jaskarth
Copy link
Member

@jaskarth jaskarth commented Feb 3, 2025

Hi all,
This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the VTransform graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing VectorCastX2Y logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine:

                                                  Baseline                    Patch
Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units   Score    Error  Units    Improvement
VectorSubword.intToByte      1024  avgt   12  200.049 ± 19.787  ns/op   56.228 ± 3.535  ns/op  (3.56x)
VectorSubword.intToShort     1024  avgt   12  179.826 ±  1.539  ns/op   43.332 ± 1.166  ns/op  (4.15x)
VectorSubword.shortToByte    1024  avgt   12  245.580 ±  6.150  ns/op   29.757 ± 1.055  ns/op  (8.25x)

I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8342095: Add autovectorizer support for subword vector casts (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413
$ git checkout pull/23413

Update a local copy of the PR:
$ git checkout pull/23413
$ git pull https://git.openjdk.org/jdk.git pull/23413/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23413

View PR using the GUI difftool:
$ git pr show -t 23413

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23413.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 3, 2025

👋 Welcome back jkarthikeyan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 3, 2025

@jaskarth This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot changed the title JDK-8342095: Add autovectorizer support for subword vector casts 8342095: Add autovectorizer support for subword vector casts Feb 3, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 3, 2025
@openjdk
Copy link

openjdk bot commented Feb 3, 2025

@jaskarth The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Feb 3, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 3, 2025


/*
* @test
* @bug 8183390 8340010 8342095
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note: This test was missing a @bug annotation, so I went through the history and added the ones that substantially modified the test.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, looks generally amazing 😊

I left a few comments below.

@jaskarth
Copy link
Member Author

jaskarth commented Feb 6, 2025

@eme64 I've pushed a new version that implements widening, as well as modifies the unit tests to use the new Generators framework. I've also filed JDK-8349562 as followup work for char types. Let me know what you think!

@jaskarth
Copy link
Member Author

jaskarth commented Feb 9, 2025

I also updated the benchmark, and got these results:

                                                  Baseline                    Patch
Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units  Score    Error  Units  Improvement
VectorSubword.byteToInt      1024  avgt   12  185.700 ± 0.798  ns/op   37.427 ± 0.276  ns/op  (4.96x)
VectorSubword.byteToShort    1024  avgt   12  240.737 ± 1.087  ns/op   23.094 ± 0.502  ns/op  (10.42x)
VectorSubword.intToByte      1024  avgt   12  181.680 ± 0.553  ns/op   49.873 ± 1.613  ns/op  (3.64x)
VectorSubword.intToShort     1024  avgt   12  176.256 ± 1.414  ns/op   43.933 ± 4.310  ns/op  (4.01x)
VectorSubword.shortToByte    1024  avgt   12  245.600 ± 6.217  ns/op   28.426 ± 0.649  ns/op  (8.64x)
VectorSubword.shortToInt     1024  avgt   12  178.364 ± 2.921  ns/op   34.140 ± 0.229  ns/op  (5.22x)

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaskarth That looks great! Thanks for the updates!

A little control question:
We now add additional vector operations into the loop body. How do we know this will not lead to a regression in some cases? I think it should not... right? Are the casts no-ops, or could they have some cost to them? Could there be any case where the wins from vectorization would not make up for the extra vector cast node?

Comment on lines 275 to 284
switch (from_bt) {
case T_INT:
case T_SHORT:
case T_BYTE: {
return to_bt == T_INT || to_bt == T_SHORT || to_bt == T_BYTE;
}
default: {
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could leave a quick comment about why CHAR is not yet covered here.

_vlen(vlen), _from_bt(from_bt), _to_bt(to_bt) {}
virtual VTransformApplyResult apply(const VLoopAnalyzer& vloop_analyzer,
const GrowableArray<Node*>& vnode_idx_to_transformed_node) const override;
NOT_PRODUCT(virtual const char* name() const override { return "Cast"; };)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NOT_PRODUCT(virtual const char* name() const override { return "Cast"; };)
NOT_PRODUCT(virtual const char* name() const override { return "CastVector"; };)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the node is not inheriting from VTransformVectorNode, but we can make that happen with a future refactoring I'm already working on.

applyIfPlatform = {"64-bit", "true"},
applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"})
applyIf = {"AlignVector", "false"},
applyIfCPUFeature = {"avx", "true"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a little nit-picky. But why have a new test-file when this test here was already trying to cover the conversion cases? I think I wrote it back then, and was just too lazy to write all conversion cases. I'd suggest you move your cases up here ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added these tests when I was reworking SuperWord::is_velt_basic_type_compatible_use_def, which you are now touching as well 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I didn't realize this test was there when I was working on my patch, but I agree it makes sense to move it there for clarity.

@eme64
Copy link
Contributor

eme64 commented Feb 17, 2025

@jaskarth just ping me whenever I should have a look again!

@jaskarth
Copy link
Member Author

@eme64 I think it should be good for another look over! I've addressed your review comments in the last commit.

About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another quick look, and have some additional questions :)


static bool is_vector_cast_supported(BasicType from_bt, BasicType to_bt) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these simply not exist, or do you just want to leave that to a future RFE so someone else can take care of this?

}
}

static bool is_vector_cast_supported(BasicType from_bt, BasicType to_bt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not live next to Matcher::match_rule_supported_vector, would that not be a better fit?

BasicType use_bt = _vloop_analyzer.types().velt_basic_type(p0);

// If the use and def types are different, emit a cast node
if (use_bt != def_bt && !p0->is_Convert() && Matcher::is_vector_cast_supported(def_bt, use_bt)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual way we check if a vector instruction is implemented is to use VectorNode::implemented. Ah, actually there is a VectorCastNode::implemented. Why are you not using that one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point! I've updated the patch to use VectorCastNode::implemented instead. I think I didn't see that function originally.

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 21, 2025

@jaskarth This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@eme64
Copy link
Contributor

eme64 commented Mar 21, 2025

@jaskarth Let me know if there is anything we can help you with here :)

@openjdk
Copy link

openjdk bot commented Apr 2, 2025

@jaskarth this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout vectorize-subword
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Apr 2, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 2, 2025
@jaskarth
Copy link
Member Author

I've merged from master and updated the tests to support the changes from JDK-8350177. Now the non-truncating nodes can compile again :) I've also added the changes from the code review. Let me know what you all think!

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments. This is really exciting that these cases could soon work! Thanks for working on it 😊

Comment on lines 2417 to 2422
bool SuperWord::is_supported_subword_cast(BasicType def_bt, BasicType use_bt, const uint pack_size) {
assert(def_bt != use_bt, "use and def types must be different");

// Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here.
return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we discussed this before: should we not move this to VectorCastNode, rather than having it in SuperWord?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the function to VectorCastNode, I think it's a better fit there because the other cast functions are located there as well.

BasicType use_bt = _vloop_analyzer.types().velt_basic_type(p0);

// If the use and def types are different, emit a cast node
if (use_bt != def_bt && !p0->is_Convert() && SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size()) really a true condition that you need to check here (and if false we can continue in the "else"), or should it be rather an assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this condition is indeed needed, since for example & 1 produces a TypeInt with a basic type of boolean, which would be otherwise harmless with the current logic but would trip the assert.

Comment on lines 512 to 513
@IR(applyIfCPUFeature = { "avx", "true" },
applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think these would be supported with asimd as well?
If you just cannot test with it feel free to file an RFE and then I can find someone to take care of it (e.g. as a starter bug).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of testing on aarch64 hardware and have updated the tests to include asimd where supported.


@Test
@IR(counts = { IRNode.STORE_VECTOR, "=0" })
@IR(applyIfCPUFeature = { "avx2", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can make the vector size more precise here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, I've replaced the any sizes with min(int, <subword>) which should continue to be portable across architectures.

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 24, 2025

@jaskarth This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 23, 2025

@jaskarth This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Oct 23, 2025
@jaskarth
Copy link
Member Author

/open

@openjdk openjdk bot reopened this Jan 23, 2026
@openjdk
Copy link

openjdk bot commented Jan 23, 2026

@jaskarth This pull request is now open

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jan 23, 2026
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 23, 2026
@jaskarth
Copy link
Member Author

Hi @eme64, I apologize for the long delay, but I have some more time to work on this! I've merged from master, updated the tests, and addressed the previous code review concerns. I would love to know what you think!

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaskarth Wow, I just realized how big the impact of this PR is, by the number of IR rules you were able to adjust. Very exciting!

I left quite a few comments below, but only 3 are about the VM code, so we are not far from the finish line :)

The rest is more about tracking future work. If you don't have the time to file the issues just let me know, and I can file some RFEs for tracking :)

One major improvement for the future, would be to track down the cases where we now cast from subword->int, then do int-ops, and cast int->subword. This loses us a factor of 2 or 4 with the vector length and introduces more ops we probably don't always need. But optimizing this could be quite a big task, so not a high priority. But we should file an issue for it for sure :)

Update: Filed:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int

// If the use and def types are different, emit a cast node
if (use_bt != def_bt && !p0->is_Convert() && VectorCastNode::is_supported_subword_cast(def_bt, use_bt, pack->size())) {
VTransformNode* in = get_vtnode(pack_in->at(0));
VTransformNode* cast = new (_vtransform.arena()) VTransformCastVectorNode(_vtransform, pack->size(), def_bt, use_bt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed: above, we already handle a cast case, but use VTransformElementWiseVectorNode:
https://github.com/openjdk/jdk/pull/23413/files#diff-cd8469676c3f287680696b4dbd87fd02b765f2c9a249bd485c55613b15843435L213-L217

I'm not happy with using VTransformElementWiseVectorNode for some casts and VTransformCastVectorNode for others. So I see 2 options:

  • Use VTransformCastVectorNode for both, refactor the code I linded.
  • Somehow try to remove VTransformCastVectorNode, and use VTransformElementWiseVectorNode here. Do you think that would be possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great suggestion! I didn't realize we already had VTransformElementWiseVectorNode which works in this case when provided the vector opcode. That means VTransformCastVectorNode isn't needed, which reduces the size of the patch significantly.


@Test
@IR(failOn = {IRNode.MIN_VI, IRNode.MIN_VF, IRNode.MIN_VD})
@IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could get more precise vector size here as well, using IRNode.VECTOR_SIZE + "min(max_int, max_short)" as you did in the other test :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up, I missed these when updating the tests!

Comment on lines 458 to 464
@IR(counts = {IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE + "min(max_int, max_byte)", "> 0",
IRNode.AND_REDUCTION_V, "> 0",
IRNode.AND_VI, "> 0"},
applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"},
applyIf = {"AutoVectorizationOverrideProfitability", "> 0"})
@IR(failOn = IRNode.LOAD_VECTOR_B,
applyIf = {"AutoVectorizationOverrideProfitability", "= 0"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I think I had not noticed this before! This is actually a great win already. Though we could still do better by not casting to int, and rather staying in byte.

I now filed
JDK-8376176: C2 SuperWord: implement/improve subword reductions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int

Comment on lines +76 to +77
@IR(counts = { IRNode.LOAD_VECTOR_S, IRNode.VECTOR_SIZE + "min(max_int, max_short)", "> 0" },
applyIfCPUFeatureOr = { "avx2", "true", "asimd", "true" })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how about here? Could we optimize and remove the casts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't be able to remove the casts in this case because CountLeadingZeros doesn't support truncation, so the cast to and from int is required for correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, got it. Makes sense :)

// Note that min operations on subword types cannot be vectorized
// because higher bits will be lost.
@IR(failOn = {IRNode.STORE_VECTOR})
@IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the size more precise, please? :)

I suspect we might be able to eventually implement this with a short min, rather than a int min?

@eme64
Copy link
Contributor

eme64 commented Jan 23, 2026

Ok, I filed this as another follow-up:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int

@jaskarth
Copy link
Member Author

Hi @eme64, thanks a lot for the review! I've pushed an update that should address the review comments and update the bug annotations and copyright years. About the cast to and from int, the only places where that should be required is when the node doesn't support truncation. Right now it looks like reductions also cast to int even when they're not required, such as with AndReduction. I can do some further investigation in a followup patch to see where the int vectors are generated.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, it now looks good to me :)

I'll run some internal testing. Please wait with for the results before integrating. But you need to get second review anyway.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 27, 2026
@eme64
Copy link
Contributor

eme64 commented Jan 27, 2026

@jaskarth Something is going a bit strange with my testing script. Could you merge with latest master, maybe that helps?

assert(def_bt != use_bt, "use and def types must be different");

// Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here.
return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we specifically require it to be a subword. If you mean this is only called with one of the two being a subword then can we use an assert instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, there are test cases for int<->long conversions (testIntToLong/testLongToInt). How do they work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merykitty Since this function is called from places that may not always have a subword type, changing it to an assert causes spurious assertion failures.

@iwanowww Scalar int<->long conversions are modeled with ConvI2L and ConvL2I, the existing superword mechanism is able to vectorize the conversion without needing to dynamically add new nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me an example where this is the case? And in that case, can we remove the is_subword_type condition and check the availability of a cast regardless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it in two places, in SuperWord::is_velt_basic_type_compatible_use_def, where the compatibility of the packs is determined, and in SuperWordVTransformBuilder::get_or_make_vtnode_vector_input_at_index, where the VTransform graph is built and the cast is dynamically inserted. In both cases the function can be called with different basic types based on the use/def edge being examined, so this filters to just edges that involve subwords to ensure that it's not changing the graph for other types. Since the graph should already be valid for converts between non-subword types, I wanted to avoid the possibility of inserting extra casts. I experimented with removing the subword checks for the SuperWord function but it seems to lead to a miscompilation in the unit tests with a wrong expected value, which I haven't yet debugged in detail.

@jaskarth
Copy link
Member Author

Thanks for the ping @eme64, I've merged from master.

@eme64
Copy link
Contributor

eme64 commented Jan 28, 2026

@jaskarth Thanks, now the script succeeded and tests are launched :)

@eme64
Copy link
Contributor

eme64 commented Jan 28, 2026

@jaskarth I'm seing a first failure on -XX:UseAVX=0.

  • compiler/loopopts/superword/TestReductions.java#vanilla
  • compiler/loopopts/superword/TestReductions.java#force-vectorization
    For both, about 30 tests have IR failures.

One example:

Failed IR Rules (30) of Methods (30)
------------------------------------
1) Method "private static byte compiler.loopopts.superword.TestReductions.byteAndBig()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_B#_", "_@min(max_int, max_byte)", "> 0", "_#AND_REDUCTION_V#_", "> 0", "_#V#AND_VI#_", "> 0"}, applyIfPlatform={}, failOn={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"AutoVectorizationOverrideProfitability", "> 0"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<B,4>)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(AndReductionV.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!
         * Constraint 3: "(\\d+(\\s){2}(AndV.*)+(\\s){2}===.*vector[A-Za-z]<I,4>)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

This should be easy to reproduce on any x64 machine with the -XX:UseAVX=0 flag. Maybe some instruction is not implemented for vectorization, and you need to require at least avx instead of sse4.1?

@eme64
Copy link
Contributor

eme64 commented Jan 30, 2026

But this is just a very minor failure, so the review can continue next to this.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Feb 2, 2026
@jaskarth
Copy link
Member Author

jaskarth commented Feb 2, 2026

@eme64 Thanks for running the tests! I think using sse4.1 here instead of avx was indeed a mistake here, and it now passes for me with UseAVX=0. I've pushed a commit fixing the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

6 participants