Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8309531: Incorrect result with unwrapped iotaShuffle. #14700

Closed
wants to merge 8 commits into from

Conversation

jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Jun 28, 2023

Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices.

  1. Disable intrinsification if effective index do not lie within byte value range.
  2. Use GT predicate while computing comparison mask for all the indices above vector length.

No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle.

This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations
with JDK-8310691.

Kindly review and share feedback.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8309531: Incorrect result with unwrapped iotaShuffle. (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700
$ git checkout pull/14700

Update a local copy of the PR:
$ git checkout pull/14700
$ git pull https://git.openjdk.org/jdk.git pull/14700/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14700

View PR using the GUI difftool:
$ git pr show -t 14700

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14700.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 28, 2023

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 28, 2023
@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@openjdk
Copy link

openjdk bot commented Jun 28, 2023

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 28, 2023
@openjdk
Copy link

openjdk bot commented Jun 28, 2023

@jatin-bhateja The hotspot-compiler label was already applied.

@mlbridge
Copy link

mlbridge bot commented Jun 28, 2023

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. I have couple of comments.

if (start_val->is_con() && step_val->is_con()) {
int effective_min_index = start_val->get_con();
int effective_max_index = start_val->get_con() + step_val->get_con() * (num_elem - 1);
effective_indices_in_range = effective_max_index > effective_min_index && effective_min_index >= -128 && effective_max_index <= 127;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

effective_max_index >= effective_min_index (step_val could be zero).

Comment on lines 681 to 688
ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::gt));
Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem));
Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
const TypeVect* vmask_type = TypeVect::makemask(elem_bt, num_elem);
Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vmask_type));
Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::gt, bcast_lane_cnt, res, pred_node, vmask_type));

// Make the indices greater than lane count as -ve values. This matches the java side implementation.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
Node * biased_val = gvn().transform(VectorNode::make(Op_SubI, res, bcast_lane_cnt, num_elem, elem_bt));
// Make the indices greater than lane count as -ve values to match the java side implementation.
res = gvn().transform(VectorNode::make(Op_AndV, res, bcast_mod, vt));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that here we are setting the mask to be true for within range good lane indices.
What happens if the index is -ve? The BoolTest:gt would not catch that as it would still be true.
We could instead check for equality of indices before and after the AndV at line 688 below. If not equal then value was out of range.
May be I am missing something here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, comparison predicate should be UGT.

Comment on lines 625 to 626

bool step_multiply = !step_val->is_con() || !is_power_of_2(step_val->get_con());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better moving this definition above its usage (e.g. line-634)?

if(do_wrap) {
// Wrap the indices greater than lane count.
res = gvn().transform(VectorNode::make(Op_AndI, res, bcast_mod, num_elem, elem_bt));
res = gvn().transform(VectorNode::make(Op_AndV, res, bcast_mod, vt));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity Style: please remove one space before res =

Comment on lines 612 to 617
bool effective_indices_in_range = false;
if (start_val->is_con() && step_val->is_con()) {
int effective_min_index = start_val->get_con();
int effective_max_index = start_val->get_con() + step_val->get_con() * (num_elem - 1);
effective_indices_in_range = effective_min_index >= -128 && effective_max_index <= 127;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why we need to fall-back to java implementation if the indices are in-effective for constant vals? While if the start_val and step_val are not all constants, for in-effective-indices, it subs to the lanecount? Is there any difference for constant and variable inputs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is moving this effective indice checking for constant values in java level. C2 compiler may optimize out it for most cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is moving this effective indice checking for constant values in java level. C2 compiler may optimize out it for most cases?

I wanted to just fix incorrectness issue with this PR given that we already have another JBS (JDK-8310691) for shuffle related overhaul in progress.

May I ask why we need to fall-back to java implementation if the indices are in-effective for constant vals? While if the start_val and step_val are not all constants, for in-effective-indices, it subs to the lanecount? Is there any difference for constant and variable inputs?

Inline expander operate over byte vectors currently, this check ensures that we do not overflow the byte value range while computing effective index for unwrapped argument, otherwise subsequent comparison mask may be incorrectly.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Jun 30, 2023

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8309531: Incorrect result with unwrapped iotaShuffle.

Reviewed-by: sviswanathan, xgong, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 73 new commits pushed to the master branch:

  • 7b3c2dc: 8311122: Fix typos in java.base
  • 607ddaa: 8310997: missing @SInCE tags in jdk.httpserver
  • 7655b48: 8311001: missing @SInCE info in jdk.net
  • 711cddd: 8311249: Remove unused MemAllocator::obj_memory_range
  • 514816e: 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch.
  • 60544f9: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0
  • 0916e6a: 8311092: Please disable runtime/jni/nativeStack/TestNativeStack.java on armhf
  • d8a0121: 8311109: tautological-compare warning in awt_Win32GraphicsDevice.cpp
  • b9198f9: 8254566: Clarify the spec of ClassLoader::getClassLoadingLock for non-parallel capable loader
  • f393975: 8310743: assert(reserved_rgn != nullptr) failed: Add committed region, No reserved region found
  • ... and 63 more: https://git.openjdk.org/jdk/compare/ef71c3215e2b37d1f79c080896f3b7fc0b7d3ea0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 30, 2023
@TobiHartmann
Copy link
Member

TestVectorShuffleIota.java fails with bad AD file on macosx-aarch64-debug:

o647  VectorMaskCmp  === _ o707 o645  [[ o650 ]]  17 #vectord[4]:{byte}

--N: o647  VectorMaskCmp  === _ o707 o645  [[ o650 ]]  17 #vectord[4]:{byte}

   --N: o707  Binary  === _ o646 o643  [[ o647 ]] 
   _Binary_vReg_vReg  0  _Binary_vReg_vReg

      --N: o646  ReplicateB  === _ o167  [[ o707 o649 ]]  #vectord[4]:{byte}
      VREG  0  VREG
      VECD  0  VECD

      --N: o643  LShiftVB  === _ o638 o642  [[ o707 o648 ]]  #vectord[4]:{byte}
      VREG  0  VREG
      VECD  0  VECD

   --N: o645  ConI  === o0  [[ o647 ]]  #int:17
   IMMI  0  IMMI
   IMMI_GT_1  0  IMMI_GT_1
   IMMI_POSITIVE  0  IMMI_POSITIVE
   IMMI_CMPU_COND  0  IMMI_CMPU_COND
   IMMI26  0  IMMI26
   IMMI19  0  IMMI19
   IMMIU7  0  IMMIU7
   IMMIU12  0  IMMIU12
   IMMIOFFSET  0  IMMIOFFSET
   IMMIOFFSET1  0  IMMIOFFSET1
   IMMIOFFSET2  0  IMMIOFFSET2
   IMMIOFFSET4  0  IMMIOFFSET4
   IMMIOFFSET8  0  IMMIOFFSET8
   IMMIOFFSET16  0  IMMIOFFSET16
   IMMI8  0  IMMI8
   IMMI8_SHIFT8  0  IMMI8_SHIFT8
   IMMBADDSUBV  0  IMMBADDSUBV
   IMMIADDSUB  0  IMMIADDSUB
   IMMIADDSUBV  0  IMMIADDSUBV
   IMMBLOG  0  IMMBLOG
   IREGI  100  loadConI
   IREGINOSP  100  loadConI
   IREGI_R0  100  loadConI
   IREGI_R2  100  loadConI
   IREGI_R3  100  loadConI
   IREGI_R4  100  loadConI
   IREGIORL2I  100  IREGI

return false;
}
if (!arch_supports_vector(Op_AndV, num_elem, elem_bt, VecMaskNotUsed)) {

if (do_wrap &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do_wrap should be !do_wrap instead?

@XiaohongGong
Copy link

TestVectorShuffleIota.java fails with bad AD file on macosx-aarch64-debug:

o647  VectorMaskCmp  === _ o707 o645  [[ o650 ]]  17 #vectord[4]:{byte}

--N: o647  VectorMaskCmp  === _ o707 o645  [[ o650 ]]  17 #vectord[4]:{byte}

   --N: o707  Binary  === _ o646 o643  [[ o647 ]] 
   _Binary_vReg_vReg  0  _Binary_vReg_vReg

      --N: o646  ReplicateB  === _ o167  [[ o707 o649 ]]  #vectord[4]:{byte}
      VREG  0  VREG
      VECD  0  VECD

      --N: o643  LShiftVB  === _ o638 o642  [[ o707 o648 ]]  #vectord[4]:{byte}
      VREG  0  VREG
      VECD  0  VECD

   --N: o645  ConI  === o0  [[ o647 ]]  #int:17
   IMMI  0  IMMI
   IMMI_GT_1  0  IMMI_GT_1
   IMMI_POSITIVE  0  IMMI_POSITIVE
   IMMI_CMPU_COND  0  IMMI_CMPU_COND
   IMMI26  0  IMMI26
   IMMI19  0  IMMI19
   IMMIU7  0  IMMIU7
   IMMIU12  0  IMMIU12
   IMMIOFFSET  0  IMMIOFFSET
   IMMIOFFSET1  0  IMMIOFFSET1
   IMMIOFFSET2  0  IMMIOFFSET2
   IMMIOFFSET4  0  IMMIOFFSET4
   IMMIOFFSET8  0  IMMIOFFSET8
   IMMIOFFSET16  0  IMMIOFFSET16
   IMMI8  0  IMMI8
   IMMI8_SHIFT8  0  IMMI8_SHIFT8
   IMMBADDSUBV  0  IMMBADDSUBV
   IMMIADDSUB  0  IMMIADDSUB
   IMMIADDSUBV  0  IMMIADDSUBV
   IMMBLOG  0  IMMBLOG
   IREGI  100  loadConI
   IREGINOSP  100  loadConI
   IREGI_R0  100  loadConI
   IREGI_R2  100  loadConI
   IREGI_R3  100  loadConI
   IREGI_R4  100  loadConI
   IREGIORL2I  100  IREGI

I saw the same failure on Arch64 NEON system on linux. It seems the do_wrap should be changed to !do_wrap as I commented.

@jatin-bhateja
Copy link
Member Author

TestVectorShuffleIota.java fails with bad AD file on macosx-aarch64-debug:

o647  VectorMaskCmp  === _ o707 o645  [[ o650 ]]  17 #vectord[4]:{byte}

--N: o647  VectorMaskCmp  === _ o707 o645  [[ o650 ]]  17 #vectord[4]:{byte}

   --N: o707  Binary  === _ o646 o643  [[ o647 ]] 
   _Binary_vReg_vReg  0  _Binary_vReg_vReg

      --N: o646  ReplicateB  === _ o167  [[ o707 o649 ]]  #vectord[4]:{byte}
      VREG  0  VREG
      VECD  0  VECD

      --N: o643  LShiftVB  === _ o638 o642  [[ o707 o648 ]]  #vectord[4]:{byte}
      VREG  0  VREG
      VECD  0  VECD

   --N: o645  ConI  === o0  [[ o647 ]]  #int:17
   IMMI  0  IMMI
   IMMI_GT_1  0  IMMI_GT_1
   IMMI_POSITIVE  0  IMMI_POSITIVE
   IMMI_CMPU_COND  0  IMMI_CMPU_COND
   IMMI26  0  IMMI26
   IMMI19  0  IMMI19
   IMMIU7  0  IMMIU7
   IMMIU12  0  IMMIU12
   IMMIOFFSET  0  IMMIOFFSET
   IMMIOFFSET1  0  IMMIOFFSET1
   IMMIOFFSET2  0  IMMIOFFSET2
   IMMIOFFSET4  0  IMMIOFFSET4
   IMMIOFFSET8  0  IMMIOFFSET8
   IMMIOFFSET16  0  IMMIOFFSET16
   IMMI8  0  IMMI8
   IMMI8_SHIFT8  0  IMMI8_SHIFT8
   IMMBADDSUBV  0  IMMBADDSUBV
   IMMIADDSUB  0  IMMIADDSUB
   IMMIADDSUBV  0  IMMIADDSUBV
   IMMBLOG  0  IMMBLOG
   IREGI  100  loadConI
   IREGINOSP  100  loadConI
   IREGI_R0  100  loadConI
   IREGI_R2  100  loadConI
   IREGI_R3  100  loadConI
   IREGI_R4  100  loadConI
   IREGIORL2I  100  IREGI

Hi @TobiHartmann, Fixed, can you kindly run this again through your test infra before checkin.

Copy link

@XiaohongGong XiaohongGong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AArch64 tests pass on linux. So LGTM!

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests passed, I added some style comments.

if (!arch_supports_vector(Op_AndV, num_elem, elem_bt, VecMaskNotUsed)) {

if (!do_wrap && !effective_indices_in_range) {
// FIXME: disable instrinsification for unwrapped shuffle iota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add FIXME's to new code or at least file a follow-up RFE and reference it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add FIXME's to new code or at least file a follow-up RFE and reference it here.

JDK-8311305 filed to address this in a follow-up RFE.

}
if (!vlen->is_con() || !is_power_of_2(vlen->get_con()) ||
shuffle_klass->const_oop() == nullptr || !wrap->is_con()) {
if (shuffle_klass == nullptr || shuffle_klass->const_oop() == nullptr ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (shuffle_klass == nullptr || shuffle_klass->const_oop() == nullptr ||
if (shuffle_klass == nullptr || shuffle_klass->const_oop() == nullptr ||


if(!step_val->is_con() || !is_power_of_2(step_val->get_con())) {
if(step_multiply) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(step_multiply) {
if (step_multiply) {

return false;

bool step_multiply = !step_val->is_con() || !is_power_of_2(step_val->get_con());
if(step_multiply) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(step_multiply) {
if (step_multiply) {

return false;
}
} else {
if (!arch_supports_vector(Op_LShiftVB, num_elem, elem_bt, VecMaskNotUsed)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be converted to else if

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 5, 2023

Going to push as commit d6578bf.
Since your change was applied there have been 73 commits pushed to the master branch:

  • 7b3c2dc: 8311122: Fix typos in java.base
  • 607ddaa: 8310997: missing @SInCE tags in jdk.httpserver
  • 7655b48: 8311001: missing @SInCE info in jdk.net
  • 711cddd: 8311249: Remove unused MemAllocator::obj_memory_range
  • 514816e: 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch.
  • 60544f9: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0
  • 0916e6a: 8311092: Please disable runtime/jni/nativeStack/TestNativeStack.java on armhf
  • d8a0121: 8311109: tautological-compare warning in awt_Win32GraphicsDevice.cpp
  • b9198f9: 8254566: Clarify the spec of ClassLoader::getClassLoadingLock for non-parallel capable loader
  • f393975: 8310743: assert(reserved_rgn != nullptr) failed: Add committed region, No reserved region found
  • ... and 63 more: https://git.openjdk.org/jdk/compare/ef71c3215e2b37d1f79c080896f3b7fc0b7d3ea0...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 5, 2023
@openjdk openjdk bot closed this Jul 5, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 5, 2023
@openjdk
Copy link

openjdk bot commented Jul 5, 2023

@jatin-bhateja Pushed as commit d6578bf.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants