Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota #81

Open
wants to merge 4 commits into
base: vectorIntrinsics
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -454,11 +454,23 @@ bool LibraryCallKit::inline_vector_shuffle_iota() {
// Wrap the indices greater than lane count.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
} else {
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
Node* mask = nullptr;
ConINode* pred_node = nullptr;
Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem));
Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));

if (Matcher::supports_unsigned_vector_comparison(num_elem, elem_bt)) {
pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ugt));
mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ugt, bcast_lane_cnt, res, pred_node, vt));
} else {
// Currently it works well for vector_length <= 1024-bits.
// for vector_length > 1024, we don't support now
Copy link
Collaborator

@XiaohongGong XiaohongGong Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need any vector length check or block for vector length >1024?

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. It is a problem here. I will do that in next commit.

Copy link
Collaborator

@XiaohongGong XiaohongGong Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @nsjian , can we currently make it unsupported as well for 1024-bits? It doesn't need to change any IR if 1024-bits is not supported. We can revisit this issue in future for vector length >= 1024-bits

// TODO: remove this branch if all archs support ugt
pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));
Node * lane_cnt_tmp = gvn().makecon(TypeInt::make(num_elem - 1));
Node * bcast_lane_cnt_tmp = gvn().transform(VectorNode::scalar2vector(lane_cnt_tmp, num_elem, type_bt));
mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt_tmp, res, pred_node, vt));
}
// Make the indices greater than lane count as -ve values. This matches the java side implementation.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
Node * biased_val = gvn().transform(VectorNode::make(Op_SubI, res, bcast_lane_cnt, num_elem, elem_bt));
@@ -0,0 +1,87 @@
/*
* Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/

package compiler.vectorapi;

import jdk.incubator.vector.ByteVector;
import jdk.incubator.vector.VectorSpecies;
import jdk.incubator.vector.VectorShuffle;

import org.testng.Assert;
import org.testng.annotations.Test;


/*
* @test
* @bug 8266720
* @modules jdk.incubator.vector
* @run testng/othervm compiler.vectorapi.TestVectorShuffleIotaByte1024
Copy link
Member

@PaulSandoz PaulSandoz May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the test can be annotated, declaring it should only execute on ARM/SVE platforms. See the use of the @requires clause used in other JDK tests.

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for you review. I think this test is used for any arch which has ByteVector.SPECIES_MAX == 1024.

Copy link
Member

@PaulSandoz PaulSandoz May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we know which arches don't support, x86, PPC etc.

I am unsure why existing shuffle tests do not catch this problem. In fact i would prefer we focus on that if we can rather than adding a specific test. Would you mind looking to see if see if we can expand on the existing shuffleTest?

Copy link
Collaborator Author

@Wanghuang-Huawei Wanghuang-Huawei Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Q : Why existing shuffle tests do not catch this problem?
  • A: Because we need vector_length >= 1024. However, in x86 we don't have this env because the longest register of x86 is 512 in AVX512.

*/

@Test
public class TestVectorShuffleIotaByte1024 {
static final VectorSpecies<Byte> SPECIESb_1024 = ByteVector.SPECIES_MAX;

static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 50000);

static final byte[] ab_1024 = {50, 49, 47, 53, 47, 49, 50, 48, 50, 32, 46, 116, 105, 32, 115,
110, 101, 104, 116, 103, 110, 101, 114, 116, 115, 32, 101,
99, 110, 101, 115, 101, 114, 112, 44, 101, 118, 111, 108,
32, 115, 110, 101, 112, 114, 97, 104, 115, 32, 101, 99, 110,
101, 115, 98, 65, 46, 117, 111, 121, 32, 101, 118, 111, 108,
32, 73, 46, 103, 110, 97, 117, 72, 32, 71, 78, 65, 87, 45, 45,
33, 117, 111, 121, 32, 103, 110, 105, 115, 115, 105, 77, 46, 117,
111, 121, 32, 111, 116, 32, 114, 101, 116, 116, 101, 108, 32,
104, 116, 52, 32, 121, 109, 32, 115, 105, 32, 115, 105, 104, 116,
44, 121, 116, 101, 101, 119, 83};

static final byte[] expected_1024 = {0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48,
51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96,
99, 102, 105, 108, 111, 114, 117, 120, 123, 126, -127, -124,
-121, -118, -115, -112, -109, -106, -103, -100, -97, -94, -91,
-88, -85, -82, -79, -76, -73, -70, -67, -64, -61, -58, -55, -52,
-49, -46, -43, -40, -37, -34, -31, -28, -25, -22, -19, -16, -13,
-10, -7, -4, -1, -126, -123, -120, -117, -114, -111, -108, -105,
-102, -99, -96, -93, -90, -87, -84, -81, -78, -75, -72, -69, -66,
-63, -60, -57, -54, -51, -48, -45, -42, -39, -36, -33, -30, -27,
-24, -21, -18, -15, -12, -9, -6, -3};

static void testShuffleIota_1024() {
ByteVector bv = (ByteVector) VectorShuffle.iota(SPECIESb_1024, 0, 3, false).toVector();
bv4.intoArray(ab_1024, 0);
}

static void testIota_1024() {
for (int ic = 0; ic < INVOC_COUNT; ic++) {
testShuffleIota_1024();
}
Assert.assertEquals(ab_1024, expected_1024);
}

@Test
static void testIota() {
if (SPECESb_1024.length() == 1024) {
testIota_1024();
}
}
}