Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota #81

Open
wants to merge 4 commits into
base: vectorIntrinsics
Choose a base branch
from

Conversation

Wanghuang-Huawei
Copy link
Collaborator

@Wanghuang-Huawei Wanghuang-Huawei commented May 12, 2021

Because JDK-8266317 has not been merged into jdk/jdk. So I fix this bug here.

  • This comparsion should be a unsigned cmp
  • x86 is not wrong because x86 does not have 1024-bits vl here.
  ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge)); // should BoolTest::ugt
  Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem));
  Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
  // should BoolTest::ugt
  Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));

Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Error

 ⚠️ Executable files are not allowed (file: src/jdk.incubator.vector/share/classes/jdk/incubator/vector/gen-src.sh)

Integration blockers

 ⚠️ Executable files are not allowed (file: src/jdk.incubator.vector/share/classes/jdk/incubator/vector/gen-src.sh) (failed with the updated jcheck configuration)
 ⚠️ Whitespace errors (failed with the updated jcheck configuration)

Issue

  • JDK-8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota (Bug - P2)

Contributors

  • Wang Huang <whuang@openjdk.org>
  • Ai Jiaming <aijiaming1@huawei.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector.git pull/81/head:pull/81
$ git checkout pull/81

Update a local copy of the PR:
$ git checkout pull/81
$ git pull https://git.openjdk.org/panama-vector.git pull/81/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 81

View PR using the GUI difftool:
$ git pr show -t 81

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/81.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 12, 2021

👋 Welcome back whuang! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@Wanghuang-Huawei
Copy link
Collaborator Author

/contributor add Wang Huang whuang@openjdk.org
/contributor add Ai Jiaming aijiaming1@huawei.com

@openjdk
Copy link

openjdk bot commented May 12, 2021

@Wanghuang-Huawei
Contributor Wang Huang <whuang@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented May 12, 2021

@Wanghuang-Huawei
Contributor Ai Jiaming <aijiaming1@huawei.com> successfully added.

@openjdk openjdk bot added the rfr label May 12, 2021
@mlbridge
Copy link

mlbridge bot commented May 12, 2021

Webrevs

* @test
* @bug 8266720
* @modules jdk.incubator.vector
* @run testng/othervm compiler.vectorapi.TestVectorShuffleIotaByte1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the test can be annotated, declaring it should only execute on ARM/SVE platforms. See the use of the @requires clause used in other JDK tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for you review. I think this test is used for any arch which has ByteVector.SPECIES_MAX == 1024.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we know which arches don't support, x86, PPC etc.

I am unsure why existing shuffle tests do not catch this problem. In fact i would prefer we focus on that if we can rather than adding a specific test. Would you mind looking to see if see if we can expand on the existing shuffleTest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Q : Why existing shuffle tests do not catch this problem?
  • A: Because we need vector_length >= 1024. However, in x86 we don't have this env because the longest register of x86 is 512 in AVX512.

@@ -453,10 +453,10 @@ bool LibraryCallKit::inline_vector_shuffle_iota() {
// Wrap the indices greater than lane count.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
} else {
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ugt));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsigned comparison adds overhead and is not supported on all architectures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing notes with @XiaohongGong , I think we can also fix like this:

    ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
    Node * lane_cnt_tmp  = gvn().makecon(TypeInt::make(num_elem - 1));
    Node * bcast_lane_cnt_tmp = gvn().transform(VectorNode::scalar2vector(lane_cnt_tmp, num_elem, type_bt));
    Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt_tmp, res, pred_node, vt));

    // Make the indices greater than lane count as -ve values. This matches the java side implementation.
    res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
    Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem)); // Add a mov & bcast here
    Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
    Node * biased_val = gvn().transform(VectorNode::make(Op_SubI, res, bcast_lane_cnt, num_elem, elem_bt));
    res = gvn().transform(new VectorBlendNode(biased_val, res, mask));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsigned comparison adds overhead and is not supported on all architectures.

However, if we don't use ugt ,we will encounter problem if length > 1024 in future. Changing < num_elem to <= 128 is just a solution to 1024 itself. If num_elem > 128, it will be invalid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently making it work well for <= 1024-bits makes sense to me. We can revisit this issue after the API issues for vector length > 1024-bits are fixed in future.

Copy link
Collaborator

@nsjian nsjian May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently making it work well for <= 1024-bits makes sense to me. We can revisit this issue after the API issues for vector length > 1024-bits are fixed in future.

If so, we need at least some comments or even length check to not inline for unsupported vector lengths?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @nsjian ! Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use ge 1024 , we will add two more nodes. It is a extra cost here?

Comment on lines +466 to +467
// Currently it works well for vector_length <= 1024-bits.
// for vector_length > 1024, we don't support now
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need any vector length check or block for vector length >1024?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. It is a problem here. I will do that in next commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @nsjian , can we currently make it unsupported as well for 1024-bits? It doesn't need to change any IR if 1024-bits is not supported. We can revisit this issue in future for vector length >= 1024-bits

@openjdk
Copy link

openjdk bot commented Feb 17, 2023

@Wanghuang-Huawei this pull request can not be integrated into vectorIntrinsics due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8266720
git fetch https://git.openjdk.org/panama-vector vectorIntrinsics
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge vectorIntrinsics"
git push

@openjdk openjdk bot added merge-conflict and removed rfr labels Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants