Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented May 29, 2024

Currently inline expansion of vector to shuffle conversion simply type casts the vector holding indexes to byte vector[1] where as fallback implementation[2] also wraps the indexes to a valid index range [0, VEC_LEN-1) or generates a -ve index for exceptional / OOB indices.

This patch extends the conversion inline expander to match the fall back implementation. This imposes around 20% performance tax on Vector.toShuffle() intrinsic but fixes this functional bug.

Kindly review and share your feedback.

Best Regards,
Jatin

PS: Patch also fixes an incorrectness issue reported with JDK-8332118

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2352
[2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java#L58


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19442/head:pull/19442
$ git checkout pull/19442

Update a local copy of the PR:
$ git checkout pull/19442
$ git pull https://git.openjdk.org/jdk.git pull/19442/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19442

View PR using the GUI difftool:
$ git pr show -t 19442

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19442.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 29, 2024

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 29, 2024

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel

Reviewed-by: sviswanathan, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 141 new commits pushed to the master branch:

  • f7dbb98: 8333086: Using Console.println is unnecessarily slow due to JLine initalization
  • 9b3694c: 8319822: Use a linear-time algorithm for assert_different_registers()
  • f73922b: 8333235: vmTestbase/nsk/jdb/kill/kill001/kill001.java fails with C1
  • 5dcb7a6: 8160755: bug6492108.java test fails with exception Image comparison failed at (0, 0) for image 4 in GTK L&F
  • 438121b: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive
  • d7d1afb: 8206447: InflaterInputStream.skip receives long but it's limited to Integer.MAX_VALUE
  • 7acfba2: 8327650: Test java/nio/channels/DatagramChannel/StressNativeSignal.java timed out
  • c5c0867: 8333252: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: must find OpaqueLoop* nodes
  • d85b0ca: 8332457: Examine startup overheads from JDK-8294961
  • 326dbb1: 8312436: CompletableFuture never completes when 'Throwable.toString()' method throws Exception
  • ... and 131 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 29, 2024
@openjdk
Copy link

openjdk bot commented May 29, 2024

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 29, 2024
@mlbridge
Copy link

mlbridge bot commented May 29, 2024

Webrevs

Comment on lines 2408 to 2411
if (is_vector_shuffle(vbox_klass_to)) {
op = wrap_indexes(op, num_elem_to, elem_bt_to);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrap_indexes is needed only for two vector rearrange. It looks to me that doing a wrap_indexes here at convert would force it for single vector rearrange (or selectFrom) and thereby reduce the performance for that case as well. Please note that the single vector rearrange throws "IndexOutOfBoundsException" and doesn't need to do a wrap.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore the above comment. I verified that each index is partially wrapped as part of toShuffle(). We should name the wrap_indexes() to partially_wrap_indexes() for clarity.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than these two minor comments, the PR looks good to me.

for (int i = 0; i < res.length; i++) {
float expected = Float.NaN;
// Exceptional index.
if (shuf[i] < 0 || shuf[i] >= FSP.length()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To better match the specs, this could be:
if ( (int)shuf[i] < 0 || (int)shuf[i] >= FSP.length()) {

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 3, 2024
@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Jun 3, 2024

Hi @vnkozlov / @TobiHartmann / @iwanowww / @eme64 , please let me know if it's good for integration.

@jatin-bhateja
Copy link
Member Author

Hi @TobiHartmann , @vnkozlov , please let me know if it's good to integrate.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 4, 2024

Please, wait our review and testing.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments.

Comment on lines 516 to 517
const TypeVect * vt = TypeVect::make(elem_bt, num_elem);
const Type * type_bt = Type::get_const_basic_type(elem_bt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove space between a type and *.

return true;
}

Node* LibraryCallKit::partially_wrap_indexes(Node* index_vec, int num_elem, BasicType elem_bt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment with pseudo code to show what this method do?

!arch_supports_vector(Op_VectorMaskCmp, num_elem_to, elem_bt_to, VecMaskNotUsed) ||
!arch_supports_vector(Op_AndV, num_elem_to, elem_bt_to, VecMaskNotUsed) ||
!arch_supports_vector(Op_Replicate, num_elem_to, elem_bt_to, VecMaskNotUsed))) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add log_if_needed(" here too.

res = gvn().transform(VectorNode::make(Op_AndV, res, bcast_mod, vt));
Node * biased_val = gvn().transform(VectorNode::make(Op_SubVB, res, bcast_lane_cnt, vt));
res = gvn().transform(new VectorBlendNode(biased_val, res, mask));
res = partially_wrap_indexes(res, num_elem, elem_bt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original spacing here was correct. One at the line 621 is wrong and have to be fixed.

// Note: Unsigned greater than comparison treat both <0 and >VEC_LENGTH indices as out-of-bound
// indexes.
Node* LibraryCallKit::partially_wrap_indexes(Node* index_vec, int num_elem, BasicType elem_bt) {
assert(elem_bt == T_BYTE, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write message in assert: why is it limited to byte?

const Type* type_bt = Type::get_const_basic_type(elem_bt);

Node* mod_val = gvn().makecon(TypeInt::make(num_elem-1));
Node* bcast_mod = gvn().transform(VectorNode::scalar2vector(mod_val, num_elem, type_bt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming issue: this is not the result of the mod, so "mod" is a bit misleading. I would use mask, as it is used as a mask in the AndV below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: it seems to me that you are duplicating these 4 lines above from its call-site. I wonder if this means that you are slicing the boundary of your new method right, or if maybe the whole if-else block from the call-site should be a new method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: it seems to me that you are duplicating these 4 lines above from its call-site. I wonder if this means that you are slicing the boundary of your new method right, or if maybe the whole if-else block from the call-site should be a new method?

The duplication you are pointing in code may not translate into IR since GVN implicitly promotes sharing based on nodes hash value which is a function of node's opcode and inputs.

* @bug 8332119
* @summary Incorrect IllegalArgumentException for C2 compiled permute kernel
* @modules jdk.incubator.vector
* @requires vm.compiler2.enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to restrict to C2? Maybe this test tickles something for other compilers as well.

* @requires vm.compiler2.enabled
* @library /test/lib /
* @run main/othervm -XX:+UnlockDiagnosticVMOptions -Xbatch -XX:-TieredCompilation -XX:CompileOnly=TestTwoVectorPermute::micro compiler.vectorapi.TestTwoVectorPermute
* @run main/othervm -XX:+UnlockDiagnosticVMOptions -Xbatch -XX:-TieredCompilation compiler.vectorapi.TestTwoVectorPermute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add a run without -XX:-TieredCompilation, that could lead to different compilation patterns, and increase our test coverage.

public class TestTwoVectorPermute {
public static final VectorSpecies<Float> FSP = FloatVector.SPECIES_256;

public static void validate(float [] res, float [] shuf, float [] src1, float [] src2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static void validate(float [] res, float [] shuf, float [] src1, float [] src2) {
public static void validate(float[] res, float[] shuf, float[] src1, float[] src2) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issues below.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Tobias ran testing for v01 and it passed.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 5, 2024

Please, answer Emanuel's questions/suggestions before integration.

@jatin-bhateja
Copy link
Member Author

Thanks @sviswa7 , @vnkozlov , @eme64 , your comments have been addressed. Integrating the patch.

@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 5, 2024

Going to push as commit 4c09d9f.
Since your change was applied there have been 141 commits pushed to the master branch:

  • f7dbb98: 8333086: Using Console.println is unnecessarily slow due to JLine initalization
  • 9b3694c: 8319822: Use a linear-time algorithm for assert_different_registers()
  • f73922b: 8333235: vmTestbase/nsk/jdb/kill/kill001/kill001.java fails with C1
  • 5dcb7a6: 8160755: bug6492108.java test fails with exception Image comparison failed at (0, 0) for image 4 in GTK L&F
  • 438121b: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive
  • d7d1afb: 8206447: InflaterInputStream.skip receives long but it's limited to Integer.MAX_VALUE
  • 7acfba2: 8327650: Test java/nio/channels/DatagramChannel/StressNativeSignal.java timed out
  • c5c0867: 8333252: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: must find OpaqueLoop* nodes
  • d85b0ca: 8332457: Examine startup overheads from JDK-8294961
  • 326dbb1: 8312436: CompletableFuture never completes when 'Throwable.toString()' method throws Exception
  • ... and 131 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 5, 2024
@openjdk openjdk bot closed this Jun 5, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 5, 2024
@openjdk
Copy link

openjdk bot commented Jun 5, 2024

@jatin-bhateja Pushed as commit 4c09d9f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jatin-bhateja jatin-bhateja deleted the JDK-8332119 branch August 1, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants