Skip to content

Conversation

@sviswa7
Copy link

@sviswa7 sviswa7 commented Aug 19, 2024

Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code.

Summary of changes is as follows:

  1. The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes.
  2. Intrinsic for wrapIndexes and selectFrom to generate efficient code

For the following source:

    public void test() {
        var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
        for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) {
            var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
            index.selectFrom(inpvect).intoArray(byteres, j);
        }
    }

The code generated for inner main now looks as follows:
;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96
0x00007f40d02274d0: movslq %ebx,%r13
0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1
0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1)
0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1
0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1)
0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1
0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1)
0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
0x00007f40d022751f: add $0x40,%ebx
0x00007f40d0227522: cmp %r8d,%ebx
0x00007f40d0227525: jl 0x00007f40d02274d0

Best Regards,
Sandhya


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Change requires CSR request JDK-8340142 to be approved
  • Commit message must refer to an issue

Issues

  • JDK-8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes (Enhancement - P4)
  • JDK-8340142: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634
$ git checkout pull/20634

Update a local copy of the PR:
$ git checkout pull/20634
$ git pull https://git.openjdk.org/jdk.git pull/20634/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20634

View PR using the GUI difftool:
$ git pr show -t 20634

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20634.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 19, 2024

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 19, 2024

@sviswa7 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes

Reviewed-by: jbhateja, psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 261 new commits pushed to the master branch:

  • d2e7708: 8341367: Problemlist ShapeNotSetSometimes.java on macOS
  • 0314973: 8341060: Cleanup statics in HeapDumper
  • 021bf63: 8340458: Open source additional Component tests (part 2)
  • 9a7817b: 8340988: Update jdk/jfr/event/gc/collection tests to accept "CodeCache GC Threshold" as valid GC reason
  • f2a767f: 8340907: Open source closed frame tests # 2
  • 7b1e6f8: 8337389: Parallel: Remove unnecessary forward declarations in psScavenge.hpp
  • 2120a84: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI
  • 684d246: 8341242: Shenandoah: LRB node is not matched as GC barrier after JDK-8340183
  • 7cc7c08: 8337493: [JVMCI] Number of libgraal threads might be too low
  • f7c7958: 8340420: ZGC: Should call vm_shutdown_during_initialization if initialization fails
  • ... and 251 more: https://git.openjdk.org/jdk/compare/81ff91ef27a6a856ae2c453a9a9b8333b91da3ab...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Aug 19, 2024

@sviswa7 The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Aug 19, 2024
@PaulSandoz
Copy link
Member

API shapes are good!

I see you intrinsified selectFrom which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression v.rearrange(this.toShuffle()). That way we can better generate an optimal set of instructions?

Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference.

@PaulSandoz
Copy link
Member

PaulSandoz commented Aug 22, 2024

Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java.

@sviswa7
Copy link
Author

sviswa7 commented Aug 22, 2024

API shapes are good!

I see you intrinsified selectFrom which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression v.rearrange(this.toShuffle()). That way we can better generate an optimal set of instructions?

Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference.

Yes, I intrinsified to generate optimial set of instructions. In the expression v.rearrange(this.toShuffle()) we will do first partial wrap as part of this.toShuffle() and then full wrap as part of rearrange. In the intrinsic I am only doing full wrap. Without intrinsic, if for whatever reason the this.toShuffle() is not moved out of the loop by the JIT, we incur additional overhead of the partial wrap in the hot code path.

I saw this happening when the following is run as part of the jmh instead of being called from standalone java with a loop:
var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) {
var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
index.selectFrom(inpvect).intoArray(byteres, j);
}

The perf difference between the intrinsic and no intrinsic observed in this case then is about 20%.

@PaulSandoz
Copy link
Member

I think this is good enough to promote out of draft and create a CSR for the API changes.

@sviswa7 sviswa7 changed the title Rearrangewrap 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Sep 12, 2024
@sviswa7
Copy link
Author

sviswa7 commented Sep 12, 2024

/label hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 12, 2024
@openjdk
Copy link

openjdk bot commented Sep 12, 2024

@sviswa7
The hotspot-compiler label was successfully added.

@sviswa7 sviswa7 marked this pull request as ready for review September 12, 2024 23:13
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 12, 2024
@mlbridge
Copy link

mlbridge bot commented Sep 12, 2024

Webrevs

@merykitty
Copy link
Member

merykitty commented Sep 13, 2024

Given rearrange with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a Vector into a VectorShuffle (currently we wrap all out-of-bound elements to [-VLEN, 0)). Then the rearrange with 2 vectors will also wrap similarly (all indices are & (VLEN * 2 - 1), then indices [0, VLEN) maps to the first vector and indices [VLEN, 2 * VLEN) map to the second vector). We will normalize the indices when we invoke VectorShuffle::toVector which I think is much less used than Vector::toShuffle. What do you think?

@sviswa7
Copy link
Author

sviswa7 commented Sep 13, 2024

Given rearrange with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a Vector into a VectorShuffle (currently we wrap all out-of-bound elements to [-VLEN, 0)). Then the rearrange with 2 vectors will also wrap similarly (all indices are & (VLEN * 2 - 1), then indices [0, VLEN) maps to the first vector and indices [VLEN, 2 * VLEN) map to the second vector). We will normalize the indices when we invoke VectorShuffle::toVector which I think is much less used than Vector::toShuffle. What do you think?

The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis.

@PaulSandoz
Copy link
Member

The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis.

Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles.

*
* The result is the same as the expression
* {@code v.rearrange(this.toShuffle())}.
* {@code v.rearrange(this.toShuffle().wrapIndexes())}.
Copy link
Member

@PaulSandoz PaulSandoz Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we also adjusted rearrange the existing expression is fine, recommend no change here and to the mask accepting version.

*
* For each lane {@code N} of the shuffle, and for each lane
* source index {@code I=s.laneSource(N)} in the shuffle,
* source index {@code I=s.wrapIndex(s.laneSource(N))} in the shuffle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudo code below starting at line 2644 needs adjusting to:

Vector<E> r = this.rearrange(s);
return broadcast(0).blend(r, m);

this, ws, m,
(v1, s_, m_) -> v1.uOp((i, a) -> {
int ei = s_.laneSource(i);
return ei < 0 || !m_.laneIsSet(i) ? 0 : v1.lane(ei);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ei < 0 test is redundant.

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Sep 13, 2024
@sviswa7
Copy link
Author

sviswa7 commented Sep 13, 2024

@PaulSandoz Thanks a lot for the review. I have addressed your review comments.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java changes are good (I created a CSR). The approach in HotSpot looks good to me, but need HotSpot reviewers.

@sviswa7
Copy link
Author

sviswa7 commented Sep 16, 2024

@PaulSandoz Thanks a lot for the review and the CSR. I will look forward to Hotspot review and CSR progress/approval.

@PaulSandoz
Copy link
Member

Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java.

Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/

@sviswa7
Copy link
Author

sviswa7 commented Sep 17, 2024

Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java.

Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/

Thanks Paul!

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the name shuffleWrapIndexes and inline_vector_shuffle_wrap_indexes.

Are you shuffling wrap-indexes? I don't know what that would even mean. I think you should name it wrapShuffleIndexes. Or is there any naming convention in the VectorAPI that prevents this?

@sviswa7
Copy link
Author

sviswa7 commented Sep 18, 2024

I'm a bit confused by the name shuffleWrapIndexes and inline_vector_shuffle_wrap_indexes.

Are you shuffling wrap-indexes? I don't know what that would even mean. I think you should name it wrapShuffleIndexes. Or is there any naming convention in the VectorAPI that prevents this?

Agree, wrapShuffleIndexes makes more sense. I will make the change.

Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sviswa7 , some comments, overall patch looks good to me.

Best Regards,
Jatin

@sviswa7
Copy link
Author

sviswa7 commented Sep 19, 2024

Thanks a lot @jatin-bhateja. I have implemented your review comments.

@sviswa7
Copy link
Author

sviswa7 commented Sep 19, 2024

Thanks a lot @eme64 for the review. I have implemented your review comment.

Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sviswa7 , LGTM

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed csr Pull request needs approved CSR before integration labels Oct 1, 2024
@sviswa7
Copy link
Author

sviswa7 commented Oct 1, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Oct 1, 2024

Going to push as commit 83dcb02.
Since your change was applied there have been 261 commits pushed to the master branch:

  • d2e7708: 8341367: Problemlist ShapeNotSetSometimes.java on macOS
  • 0314973: 8341060: Cleanup statics in HeapDumper
  • 021bf63: 8340458: Open source additional Component tests (part 2)
  • 9a7817b: 8340988: Update jdk/jfr/event/gc/collection tests to accept "CodeCache GC Threshold" as valid GC reason
  • f2a767f: 8340907: Open source closed frame tests # 2
  • 7b1e6f8: 8337389: Parallel: Remove unnecessary forward declarations in psScavenge.hpp
  • 2120a84: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI
  • 684d246: 8341242: Shenandoah: LRB node is not matched as GC barrier after JDK-8340183
  • 7cc7c08: 8337493: [JVMCI] Number of libgraal threads might be too low
  • f7c7958: 8340420: ZGC: Should call vm_shutdown_during_initialization if initialization fails
  • ... and 251 more: https://git.openjdk.org/jdk/compare/81ff91ef27a6a856ae2c453a9a9b8333b91da3ab...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 1, 2024
@openjdk openjdk bot closed this Oct 1, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 1, 2024
@openjdk
Copy link

openjdk bot commented Oct 1, 2024

@sviswa7 Pushed as commit 83dcb02.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants