-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. #17261
Conversation
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
@jatin-bhateja The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jatin-bhateja this looks like a great improvement!
I have a few questions and requests below.
FYI, this feels very inspiring. I'm dreaming of a day where we could do this filtering in the auto-vectorizer directly.
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Show resolved
Hide resolved
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant?
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Show resolved
Hide resolved
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Show resolved
Hide resolved
CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. |
@jatin-bhateja I think you can expand them in the matcher into several |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, like @merykitty suggests: you can do a platform-dependent expansion.
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Show resolved
Hide resolved
Hi @merykitty , @eme64 , in principle platform specific lowering is a good idea where ever useful, our main concern here is to identify a loop invariant constant mask in matcher patterns and save the cost of re-loading from a permute table index. Existing loop invariant analysis moves invariant masks out of loop and GCM should be able to move expanded load from permute table out of loop. But this looks very restrictive and will mainly be useful for constant one hot bit mask pattern. A constant mask may have more than one set bits and in such a case we will need to generate multiple loads from permute tables and handle multiple expansion scenarios. I think we can defer that complexity for the time being. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are almost there! 😊
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Show resolved
Hide resolved
test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! The patch looks good to me now.
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 26 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, except for one detail ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll just run the testing again, and then I will approve this :)
Hi @eme64 , let us know test results. Best Regards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing passed, looks good now :)
Nice progress, the code now is simpler and much more understandable!
/integrate |
Going to push as commit 6d36eb7.
Your commit was automatically rebased without conflicts. |
@jatin-bhateja Pushed as commit 6d36eb7. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi,
Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets.
Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set.
These are very frequently used APIs in columnar database filter operation.
Implementation uses a lookup table to record permute indices. Table index is computed using
mask argument of compress/expand operation.
Following are the performance number of JMH micro included with the patch.
Kindly review and share your feedback.
Best Regards,
Jatin
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261
$ git checkout pull/17261
Update a local copy of the PR:
$ git checkout pull/17261
$ git pull https://git.openjdk.org/jdk.git pull/17261/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17261
View PR using the GUI difftool:
$ git pr show -t 17261
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17261.diff
Webrev
Link to Webrev Comment