Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota #81

Open
wants to merge 4 commits into
base: vectorIntrinsics
Choose a base branch
from

Conversation

@Wanghuang-Huawei
Copy link
Collaborator

@Wanghuang-Huawei Wanghuang-Huawei commented May 12, 2021

Because JDK-8266317 has not been merged into jdk/jdk. So I fix this bug here.

  • This comparsion should be a unsigned cmp
  • x86 is not wrong because x86 does not have 1024-bits vl here.
  ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge)); // should BoolTest::ugt
  Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem));
  Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
  // should BoolTest::ugt
  Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));

Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota

Contributors

  • Wang Huang <whuang@openjdk.org>
  • Ai Jiaming <aijiaming1@huawei.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/81/head:pull/81
$ git checkout pull/81

Update a local copy of the PR:
$ git checkout pull/81
$ git pull https://git.openjdk.java.net/panama-vector pull/81/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 81

View PR using the GUI difftool:
$ git pr show -t 81

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/81.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 12, 2021

👋 Welcome back whuang! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@Wanghuang-Huawei
Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei commented May 12, 2021

/contributor add Wang Huang whuang@openjdk.org
/contributor add Ai Jiaming aijiaming1@huawei.com

@openjdk
Copy link

@openjdk openjdk bot commented May 12, 2021

@Wanghuang-Huawei
Contributor Wang Huang <whuang@openjdk.org> successfully added.

@openjdk
Copy link

@openjdk openjdk bot commented May 12, 2021

@Wanghuang-Huawei
Contributor Ai Jiaming <aijiaming1@huawei.com> successfully added.

@openjdk openjdk bot added the rfr label May 12, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 12, 2021

Webrevs

* @test
* @bug 8266720
* @modules jdk.incubator.vector
* @run testng/othervm compiler.vectorapi.TestVectorShuffleIotaByte1024
Copy link
Member

@PaulSandoz PaulSandoz May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the test can be annotated, declaring it should only execute on ARM/SVE platforms. See the use of the @requires clause used in other JDK tests.

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for you review. I think this test is used for any arch which has ByteVector.SPECIES_MAX == 1024.

Copy link
Member

@PaulSandoz PaulSandoz May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we know which arches don't support, x86, PPC etc.

I am unsure why existing shuffle tests do not catch this problem. In fact i would prefer we focus on that if we can rather than adding a specific test. Would you mind looking to see if see if we can expand on the existing shuffleTest?

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Q : Why existing shuffle tests do not catch this problem?
  • A: Because we need vector_length >= 1024. However, in x86 we don't have this env because the longest register of x86 is 512 in AVX512.

@@ -453,10 +453,10 @@ bool LibraryCallKit::inline_vector_shuffle_iota() {
// Wrap the indices greater than lane count.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
} else {
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ugt));
Copy link
Collaborator

@sviswa7 sviswa7 May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsigned comparison adds overhead and is not supported on all architectures.

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing notes with @XiaohongGong , I think we can also fix like this:

    ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
    Node * lane_cnt_tmp  = gvn().makecon(TypeInt::make(num_elem - 1));
    Node * bcast_lane_cnt_tmp = gvn().transform(VectorNode::scalar2vector(lane_cnt_tmp, num_elem, type_bt));
    Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt_tmp, res, pred_node, vt));

    // Make the indices greater than lane count as -ve values. This matches the java side implementation.
    res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
    Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem)); // Add a mov & bcast here
    Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
    Node * biased_val = gvn().transform(VectorNode::make(Op_SubI, res, bcast_lane_cnt, num_elem, elem_bt));
    res = gvn().transform(new VectorBlendNode(biased_val, res, mask));

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsigned comparison adds overhead and is not supported on all architectures.

However, if we don't use ugt ,we will encounter problem if length > 1024 in future. Changing < num_elem to <= 128 is just a solution to 1024 itself. If num_elem > 128, it will be invalid.

Copy link
Collaborator

@XiaohongGong XiaohongGong May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently making it work well for <= 1024-bits makes sense to me. We can revisit this issue after the API issues for vector length > 1024-bits are fixed in future.

Copy link
Collaborator

@nsjian nsjian May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently making it work well for <= 1024-bits makes sense to me. We can revisit this issue after the API issues for vector length > 1024-bits are fixed in future.

If so, we need at least some comments or even length check to not inline for unsupported vector lengths?

Copy link
Collaborator

@XiaohongGong XiaohongGong May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @nsjian ! Thanks!

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use ge 1024 , we will add two more nodes. It is a extra cost here?

// Currently it works well for vector_length <= 1024-bits.
// for vector_length > 1024, we don't support now
Copy link
Collaborator

@XiaohongGong XiaohongGong Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need any vector length check or block for vector length >1024?

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. It is a problem here. I will do that in next commit.

Copy link
Collaborator

@XiaohongGong XiaohongGong Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @nsjian , can we currently make it unsupported as well for 1024-bits? It doesn't need to change any IR if 1024-bits is not supported. We can revisit this issue in future for vector length >= 1024-bits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 participants