8342095: Add autovectorizer support for subword vector casts#23413
8342095: Add autovectorizer support for subword vector casts#23413jaskarth wants to merge 22 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back jkarthikeyan! A progress list of the required criteria for merging this PR into |
|
@jaskarth This change is no longer ready for integration - check the PR body for details. |
Webrevs
|
|
|
||
| /* | ||
| * @test | ||
| * @bug 8183390 8340010 8342095 |
There was a problem hiding this comment.
Review note: This test was missing a @bug annotation, so I went through the history and added the ones that substantially modified the test.
eme64
left a comment
There was a problem hiding this comment.
Great work, looks generally amazing 😊
I left a few comments below.
test/hotspot/jtreg/compiler/loopopts/superword/TestSubwordVectorization.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/loopopts/superword/TestSubwordVectorization.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java
Show resolved
Hide resolved
|
@eme64 I've pushed a new version that implements widening, as well as modifies the unit tests to use the new |
|
I also updated the benchmark, and got these results: Baseline Patch
Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement
VectorSubword.byteToInt 1024 avgt 12 185.700 ± 0.798 ns/op 37.427 ± 0.276 ns/op (4.96x)
VectorSubword.byteToShort 1024 avgt 12 240.737 ± 1.087 ns/op 23.094 ± 0.502 ns/op (10.42x)
VectorSubword.intToByte 1024 avgt 12 181.680 ± 0.553 ns/op 49.873 ± 1.613 ns/op (3.64x)
VectorSubword.intToShort 1024 avgt 12 176.256 ± 1.414 ns/op 43.933 ± 4.310 ns/op (4.01x)
VectorSubword.shortToByte 1024 avgt 12 245.600 ± 6.217 ns/op 28.426 ± 0.649 ns/op (8.64x)
VectorSubword.shortToInt 1024 avgt 12 178.364 ± 2.921 ns/op 34.140 ± 0.229 ns/op (5.22x) |
eme64
left a comment
There was a problem hiding this comment.
@jaskarth That looks great! Thanks for the updates!
A little control question:
We now add additional vector operations into the loop body. How do we know this will not lead to a regression in some cases? I think it should not... right? Are the casts no-ops, or could they have some cost to them? Could there be any case where the wins from vectorization would not make up for the extra vector cast node?
src/hotspot/cpu/x86/matcher_x86.hpp
Outdated
| switch (from_bt) { | ||
| case T_INT: | ||
| case T_SHORT: | ||
| case T_BYTE: { | ||
| return to_bt == T_INT || to_bt == T_SHORT || to_bt == T_BYTE; | ||
| } | ||
| default: { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
You could leave a quick comment about why CHAR is not yet covered here.
| _vlen(vlen), _from_bt(from_bt), _to_bt(to_bt) {} | ||
| virtual VTransformApplyResult apply(const VLoopAnalyzer& vloop_analyzer, | ||
| const GrowableArray<Node*>& vnode_idx_to_transformed_node) const override; | ||
| NOT_PRODUCT(virtual const char* name() const override { return "Cast"; };) |
There was a problem hiding this comment.
| NOT_PRODUCT(virtual const char* name() const override { return "Cast"; };) | |
| NOT_PRODUCT(virtual const char* name() const override { return "CastVector"; };) |
There was a problem hiding this comment.
I know the node is not inheriting from VTransformVectorNode, but we can make that happen with a future refactoring I'm already working on.
| applyIfPlatform = {"64-bit", "true"}, | ||
| applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) | ||
| applyIf = {"AlignVector", "false"}, | ||
| applyIfCPUFeature = {"avx", "true"}) |
There was a problem hiding this comment.
This may be a little nit-picky. But why have a new test-file when this test here was already trying to cover the conversion cases? I think I wrote it back then, and was just too lazy to write all conversion cases. I'd suggest you move your cases up here ;)
There was a problem hiding this comment.
I think I added these tests when I was reworking SuperWord::is_velt_basic_type_compatible_use_def, which you are now touching as well 😊
There was a problem hiding this comment.
I think I didn't realize this test was there when I was working on my patch, but I agree it makes sense to move it there for clarity.
|
@jaskarth just ping me whenever I should have a look again! |
|
@eme64 I think it should be good for another look over! I've addressed your review comments in the last commit. About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet. |
eme64
left a comment
There was a problem hiding this comment.
I had another quick look, and have some additional questions :)
|
|
||
| static bool is_vector_cast_supported(BasicType from_bt, BasicType to_bt) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Do these simply not exist, or do you just want to leave that to a future RFE so someone else can take care of this?
src/hotspot/cpu/x86/matcher_x86.hpp
Outdated
| } | ||
| } | ||
|
|
||
| static bool is_vector_cast_supported(BasicType from_bt, BasicType to_bt) { |
There was a problem hiding this comment.
Why does this not live next to Matcher::match_rule_supported_vector, would that not be a better fit?
| BasicType use_bt = _vloop_analyzer.types().velt_basic_type(p0); | ||
|
|
||
| // If the use and def types are different, emit a cast node | ||
| if (use_bt != def_bt && !p0->is_Convert() && Matcher::is_vector_cast_supported(def_bt, use_bt)) { |
There was a problem hiding this comment.
The usual way we check if a vector instruction is implemented is to use VectorNode::implemented. Ah, actually there is a VectorCastNode::implemented. Why are you not using that one?
There was a problem hiding this comment.
This is a good point! I've updated the patch to use VectorCastNode::implemented instead. I think I didn't see that function originally.
|
@jaskarth This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@jaskarth Let me know if there is anything we can help you with here :) |
|
@jaskarth this pull request can not be integrated into git checkout vectorize-subword
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
I've merged from master and updated the tests to support the changes from JDK-8350177. Now the non-truncating nodes can compile again :) I've also added the changes from the code review. Let me know what you all think! |
eme64
left a comment
There was a problem hiding this comment.
I have a few more comments. This is really exciting that these cases could soon work! Thanks for working on it 😊
src/hotspot/share/opto/superword.cpp
Outdated
| bool SuperWord::is_supported_subword_cast(BasicType def_bt, BasicType use_bt, const uint pack_size) { | ||
| assert(def_bt != use_bt, "use and def types must be different"); | ||
|
|
||
| // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. | ||
| return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); | ||
| } |
There was a problem hiding this comment.
Not sure if we discussed this before: should we not move this to VectorCastNode, rather than having it in SuperWord?
There was a problem hiding this comment.
I've moved the function to VectorCastNode, I think it's a better fit there because the other cast functions are located there as well.
| BasicType use_bt = _vloop_analyzer.types().velt_basic_type(p0); | ||
|
|
||
| // If the use and def types are different, emit a cast node | ||
| if (use_bt != def_bt && !p0->is_Convert() && SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())) { |
There was a problem hiding this comment.
Is SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size()) really a true condition that you need to check here (and if false we can continue in the "else"), or should it be rather an assert?
There was a problem hiding this comment.
I believe this condition is indeed needed, since for example & 1 produces a TypeInt with a basic type of boolean, which would be otherwise harmless with the current logic but would trip the assert.
| @IR(applyIfCPUFeature = { "avx", "true" }, | ||
| applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"}, |
There was a problem hiding this comment.
Do you think these would be supported with asimd as well?
If you just cannot test with it feel free to file an RFE and then I can find someone to take care of it (e.g. as a starter bug).
There was a problem hiding this comment.
I did a bit of testing on aarch64 hardware and have updated the tests to include asimd where supported.
|
|
||
| @Test | ||
| @IR(counts = { IRNode.STORE_VECTOR, "=0" }) | ||
| @IR(applyIfCPUFeature = { "avx2", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) |
There was a problem hiding this comment.
Do you think we can make the vector size more precise here?
There was a problem hiding this comment.
This is a good idea, I've replaced the any sizes with min(int, <subword>) which should continue to be portable across architectures.
|
@jaskarth This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
@jaskarth This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
|
/open |
|
@jaskarth This pull request is now open |
|
Hi @eme64, I apologize for the long delay, but I have some more time to work on this! I've merged from master, updated the tests, and addressed the previous code review concerns. I would love to know what you think! |
There was a problem hiding this comment.
@jaskarth Wow, I just realized how big the impact of this PR is, by the number of IR rules you were able to adjust. Very exciting!
I left quite a few comments below, but only 3 are about the VM code, so we are not far from the finish line :)
The rest is more about tracking future work. If you don't have the time to file the issues just let me know, and I can file some RFEs for tracking :)
One major improvement for the future, would be to track down the cases where we now cast from subword->int, then do int-ops, and cast int->subword. This loses us a factor of 2 or 4 with the vector length and introduces more ops we probably don't always need. But optimizing this could be quite a big task, so not a high priority. But we should file an issue for it for sure :)
Update: Filed:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int
| // If the use and def types are different, emit a cast node | ||
| if (use_bt != def_bt && !p0->is_Convert() && VectorCastNode::is_supported_subword_cast(def_bt, use_bt, pack->size())) { | ||
| VTransformNode* in = get_vtnode(pack_in->at(0)); | ||
| VTransformNode* cast = new (_vtransform.arena()) VTransformCastVectorNode(_vtransform, pack->size(), def_bt, use_bt); |
There was a problem hiding this comment.
I just noticed: above, we already handle a cast case, but use VTransformElementWiseVectorNode:
https://github.com/openjdk/jdk/pull/23413/files#diff-cd8469676c3f287680696b4dbd87fd02b765f2c9a249bd485c55613b15843435L213-L217
I'm not happy with using VTransformElementWiseVectorNode for some casts and VTransformCastVectorNode for others. So I see 2 options:
- Use
VTransformCastVectorNodefor both, refactor the code I linded. - Somehow try to remove
VTransformCastVectorNode, and useVTransformElementWiseVectorNodehere. Do you think that would be possible?
There was a problem hiding this comment.
This is a great suggestion! I didn't realize we already had VTransformElementWiseVectorNode which works in this case when provided the vector opcode. That means VTransformCastVectorNode isn't needed, which reduces the size of the patch significantly.
|
|
||
| @Test | ||
| @IR(failOn = {IRNode.MIN_VI, IRNode.MIN_VF, IRNode.MIN_VD}) | ||
| @IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) |
There was a problem hiding this comment.
I think you could get more precise vector size here as well, using IRNode.VECTOR_SIZE + "min(max_int, max_short)" as you did in the other test :)
There was a problem hiding this comment.
Thanks for the heads up, I missed these when updating the tests!
| @IR(counts = {IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE + "min(max_int, max_byte)", "> 0", | ||
| IRNode.AND_REDUCTION_V, "> 0", | ||
| IRNode.AND_VI, "> 0"}, | ||
| applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}, | ||
| applyIf = {"AutoVectorizationOverrideProfitability", "> 0"}) | ||
| @IR(failOn = IRNode.LOAD_VECTOR_B, | ||
| applyIf = {"AutoVectorizationOverrideProfitability", "= 0"}) |
There was a problem hiding this comment.
Wow, I think I had not noticed this before! This is actually a great win already. Though we could still do better by not casting to int, and rather staying in byte.
I now filed
JDK-8376176: C2 SuperWord: implement/improve subword reductions
There was a problem hiding this comment.
Related:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int
test/hotspot/jtreg/compiler/vectorization/TestRotateByteAndShortVector.java
Show resolved
Hide resolved
| @IR(counts = { IRNode.LOAD_VECTOR_S, IRNode.VECTOR_SIZE + "min(max_int, max_short)", "> 0" }, | ||
| applyIfCPUFeatureOr = { "avx2", "true", "asimd", "true" }) |
There was a problem hiding this comment.
And how about here? Could we optimize and remove the casts?
There was a problem hiding this comment.
Filed:
JDK-8376179: C2 SuperWord: improve subword vectorization, avoid cast to-and-from int
There was a problem hiding this comment.
We wouldn't be able to remove the casts in this case because CountLeadingZeros doesn't support truncation, so the cast to and from int is required for correctness.
There was a problem hiding this comment.
Ah ok, got it. Makes sense :)
test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java
Show resolved
Hide resolved
| // Note that min operations on subword types cannot be vectorized | ||
| // because higher bits will be lost. | ||
| @IR(failOn = {IRNode.STORE_VECTOR}) | ||
| @IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) |
There was a problem hiding this comment.
Can we make the size more precise, please? :)
I suspect we might be able to eventually implement this with a short min, rather than a int min?
|
Ok, I filed this as another follow-up: |
|
Hi @eme64, thanks a lot for the review! I've pushed an update that should address the review comments and update the bug annotations and copyright years. About the cast to and from int, the only places where that should be required is when the node doesn't support truncation. Right now it looks like reductions also cast to int even when they're not required, such as with |
eme64
left a comment
There was a problem hiding this comment.
Excellent, it now looks good to me :)
I'll run some internal testing. Please wait with for the results before integrating. But you need to get second review anyway.
|
@jaskarth Something is going a bit strange with my testing script. Could you merge with latest master, maybe that helps? |
| assert(def_bt != use_bt, "use and def types must be different"); | ||
|
|
||
| // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. | ||
| return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); |
There was a problem hiding this comment.
Why do we specifically require it to be a subword. If you mean this is only called with one of the two being a subword then can we use an assert instead?
There was a problem hiding this comment.
Moreover, there are test cases for int<->long conversions (testIntToLong/testLongToInt). How do they work?
There was a problem hiding this comment.
@merykitty Since this function is called from places that may not always have a subword type, changing it to an assert causes spurious assertion failures.
@iwanowww Scalar int<->long conversions are modeled with ConvI2L and ConvL2I, the existing superword mechanism is able to vectorize the conversion without needing to dynamically add new nodes.
There was a problem hiding this comment.
Can you give me an example where this is the case? And in that case, can we remove the is_subword_type condition and check the availability of a cast regardless?
There was a problem hiding this comment.
We call it in two places, in SuperWord::is_velt_basic_type_compatible_use_def, where the compatibility of the packs is determined, and in SuperWordVTransformBuilder::get_or_make_vtnode_vector_input_at_index, where the VTransform graph is built and the cast is dynamically inserted. In both cases the function can be called with different basic types based on the use/def edge being examined, so this filters to just edges that involve subwords to ensure that it's not changing the graph for other types. Since the graph should already be valid for converts between non-subword types, I wanted to avoid the possibility of inserting extra casts. I experimented with removing the subword checks for the SuperWord function but it seems to lead to a miscompilation in the unit tests with a wrong expected value, which I haven't yet debugged in detail.
|
Thanks for the ping @eme64, I've merged from master. |
|
@jaskarth Thanks, now the script succeeded and tests are launched :) |
|
@jaskarth I'm seing a first failure on
One example: This should be easy to reproduce on any |
|
But this is just a very minor failure, so the review can continue next to this. |
|
@eme64 Thanks for running the tests! I think using sse4.1 here instead of avx was indeed a mistake here, and it now passes for me with UseAVX=0. I've pushed a commit fixing the bug. |
Hi all,
This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the
VTransformgraph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existingVectorCastX2Ylogic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine:I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated!
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413$ git checkout pull/23413Update a local copy of the PR:
$ git checkout pull/23413$ git pull https://git.openjdk.org/jdk.git pull/23413/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 23413View PR using the GUI difftool:
$ git pr show -t 23413Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23413.diff
Using Webrev
Link to Webrev Comment