-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes #20634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into |
|
@sviswa7 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 261 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
API shapes are good! I see you intrinsified Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. |
|
Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. |
Yes, I intrinsified to generate optimial set of instructions. In the expression I saw this happening when the following is run as part of the jmh instead of being called from standalone java with a loop: The perf difference between the intrinsic and no intrinsic observed in this case then is about 20%. |
|
I think this is good enough to promote out of draft and create a CSR for the API changes. |
|
/label hotspot-compiler |
|
@sviswa7 |
Webrevs
|
|
Given |
The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. |
Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. |
| * | ||
| * The result is the same as the expression | ||
| * {@code v.rearrange(this.toShuffle())}. | ||
| * {@code v.rearrange(this.toShuffle().wrapIndexes())}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we also adjusted rearrange the existing expression is fine, recommend no change here and to the mask accepting version.
| * | ||
| * For each lane {@code N} of the shuffle, and for each lane | ||
| * source index {@code I=s.laneSource(N)} in the shuffle, | ||
| * source index {@code I=s.wrapIndex(s.laneSource(N))} in the shuffle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudo code below starting at line 2644 needs adjusting to:
Vector<E> r = this.rearrange(s);
return broadcast(0).blend(r, m);| this, ws, m, | ||
| (v1, s_, m_) -> v1.uOp((i, a) -> { | ||
| int ei = s_.laneSource(i); | ||
| return ei < 0 || !m_.laneIsSet(i) ? 0 : v1.lane(ei); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ei < 0 test is redundant.
|
@PaulSandoz Thanks a lot for the review. I have addressed your review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java changes are good (I created a CSR). The approach in HotSpot looks good to me, but need HotSpot reviewers.
|
@PaulSandoz Thanks a lot for the review and the CSR. I will look forward to Hotspot review and CSR progress/approval. |
Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ |
Thanks Paul! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the name shuffleWrapIndexes and inline_vector_shuffle_wrap_indexes.
Are you shuffling wrap-indexes? I don't know what that would even mean. I think you should name it wrapShuffleIndexes. Or is there any naming convention in the VectorAPI that prevents this?
Agree, wrapShuffleIndexes makes more sense. I will make the change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sviswa7 , some comments, overall patch looks good to me.
Best Regards,
Jatin
|
Thanks a lot @jatin-bhateja. I have implemented your review comments. |
|
Thanks a lot @eme64 for the review. I have implemented your review comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sviswa7 , LGTM
|
/integrate |
|
Going to push as commit 83dcb02.
Your commit was automatically rebased without conflicts. |
Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code.
Summary of changes is as follows:
For the following source:
The code generated for inner main now looks as follows:
;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96
0x00007f40d02274d0: movslq %ebx,%r13
0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1
0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1)
0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1
0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1)
0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1
0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1)
0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
0x00007f40d022751f: add $0x40,%ebx
0x00007f40d0227522: cmp %r8d,%ebx
0x00007f40d0227525: jl 0x00007f40d02274d0
Best Regards,
Sandhya
Progress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634$ git checkout pull/20634Update a local copy of the PR:
$ git checkout pull/20634$ git pull https://git.openjdk.org/jdk.git pull/20634/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20634View PR using the GUI difftool:
$ git pr show -t 20634Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20634.diff
Webrev
Link to Webrev Comment