New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota #81
base: vectorIntrinsics
Are you sure you want to change the base?
8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota #81
Conversation
👋 Welcome back whuang! A progress list of the required criteria for merging this PR into |
/contributor add Wang Huang whuang@openjdk.org |
@Wanghuang-Huawei |
@Wanghuang-Huawei |
* @test | ||
* @bug 8266720 | ||
* @modules jdk.incubator.vector | ||
* @run testng/othervm compiler.vectorapi.TestVectorShuffleIotaByte1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the test can be annotated, declaring it should only execute on ARM/SVE platforms. See the use of the @requires
clause used in other JDK tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for you review. I think this test is used for any arch which has ByteVector.SPECIES_MAX == 1024
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we know which arches don't support, x86, PPC etc.
I am unsure why existing shuffle tests do not catch this problem. In fact i would prefer we focus on that if we can rather than adding a specific test. Would you mind looking to see if see if we can expand on the existing shuffleTest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Q : Why existing shuffle tests do not catch this problem?
- A: Because we need
vector_length >= 1024
. However, in x86 we don't have this env because the longest register of x86 is 512 in AVX512.
@@ -453,10 +453,10 @@ bool LibraryCallKit::inline_vector_shuffle_iota() { | |||
// Wrap the indices greater than lane count. | |||
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt)); | |||
} else { | |||
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge)); | |||
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ugt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsigned comparison adds overhead and is not supported on all architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After changing notes with @XiaohongGong , I think we can also fix like this:
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
Node * lane_cnt_tmp = gvn().makecon(TypeInt::make(num_elem - 1));
Node * bcast_lane_cnt_tmp = gvn().transform(VectorNode::scalar2vector(lane_cnt_tmp, num_elem, type_bt));
Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt_tmp, res, pred_node, vt));
// Make the indices greater than lane count as -ve values. This matches the java side implementation.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); // Add a mov & bcast here
Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
Node * biased_val = gvn().transform(VectorNode::make(Op_SubI, res, bcast_lane_cnt, num_elem, elem_bt));
res = gvn().transform(new VectorBlendNode(biased_val, res, mask));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsigned comparison adds overhead and is not supported on all architectures.
However, if we don't use ugt ,we will encounter problem if length > 1024 in future. Changing < num_elem
to <= 128
is just a solution to 1024
itself. If num_elem > 128
, it will be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently making it work well for <= 1024-bits makes sense to me. We can revisit this issue after the API issues for vector length > 1024-bits are fixed in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently making it work well for <= 1024-bits makes sense to me. We can revisit this issue after the API issues for vector length > 1024-bits are fixed in future.
If so, we need at least some comments or even length check to not inline for unsupported vector lengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @nsjian ! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use ge 1024
, we will add two more nodes. It is a extra cost here?
// Currently it works well for vector_length <= 1024-bits. | ||
// for vector_length > 1024, we don't support now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need any vector length check or block for vector length >1024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. It is a problem here. I will do that in next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @nsjian , can we currently make it unsupported as well for 1024-bits? It doesn't need to change any IR if 1024-bits is not supported. We can revisit this issue in future for vector length >= 1024-bits
@Wanghuang-Huawei this pull request can not be integrated into git checkout JDK-8266720
git fetch https://git.openjdk.org/panama-vector vectorIntrinsics
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge vectorIntrinsics"
git push |
Because JDK-8266317 has not been merged into jdk/jdk. So I fix this bug here.
x86
is not wrong becausex86
does not have 1024-bits vl here.Progress
Error
Integration blockers
Issue
Contributors
<whuang@openjdk.org>
<aijiaming1@huawei.com>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector.git pull/81/head:pull/81
$ git checkout pull/81
Update a local copy of the PR:
$ git checkout pull/81
$ git pull https://git.openjdk.org/panama-vector.git pull/81/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 81
View PR using the GUI difftool:
$ git pr show -t 81
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/81.diff
Webrev
Link to Webrev Comment